Premium factor – Analyzing usage of premium cars compared to conventional cars

Premium factor – Analyzing usage of premium cars compared to conventional cars

Research in Transportation Business & Management xxx (xxxx) xxxx Contents lists available at ScienceDirect Research in Transportation Business & Man...

758KB Sizes 0 Downloads 9 Views

Research in Transportation Business & Management xxx (xxxx) xxxx

Contents lists available at ScienceDirect

Research in Transportation Business & Management journal homepage: www.elsevier.com/locate/rtbm

Premium factor – Analyzing usage of premium cars compared to conventional cars Ulrich Niklasa, , Sascha von Behrenb, Christine Eisenmannc, Bastian Chlondb, Peter Vortischb ⁎

a

BMW AG, Petuelring 130, 80788 Munich, Germany Institute for Transport Studies, Karlsruhe Institute of Technology (KIT), Kaiserstrasse 12, 76131 Karlsruhe, Germany c Institute of Transport Research, German Aerospace Center (DLR), Rutherfordstrasse 2, 12489 Berlin, Germany b

ARTICLE INFO

ABSTRACT

Keywords: Car usage profiles Sensor data Premium cars Support vector machine Supervised learning

Car use is affected by various factors (e.g. spatial structures). As a result, the potential for new mobility services and propulsion technologies is perceived to be different depending on these factors. For this purpose, it is necessary to holistically observe the use of cars by all occupants. This comparative study investigates an important factor that can differentiate car use: the premium factor. To address this issue, a sensor data based sample of premium car usage in Germany and California in May 2018 was used. This was compared with a car usage model form survey data to generate detailed car trip profiles from conventional cars in Germany and California over the course of one year. Premium cars are produced by premium Original Equipment Manufacturer (Audi, BMW, Mercedes or Tesla). Conventional cars include all vehicles independent from the brand in a representative population. Specific premium characteristics were identified, such as more frequent long-distance travel, the higher frequency of car use and a very broad car usage with no link to specific purposes. New mobility services and propulsion technologies, premium provider should offer their cars with an extended radius and extended electric range. Furthermore, a universal usability of premium cars should be guaranteed.

1. Introduction The car is a flexible mobility tool compared to other means of transport (e.g. public transit). It can be used in a wide variety of contexts, e.g. commuting, transport of things, chauffeuring, and vacation trips. In order to the needs for vehicle mobility, individuals can choose between a variety of vehicles types that differ in price, size and comfort. In this context, an interesting question arises: how does the usage among different type of vehicles (e.g. premium brand, size) vary? Premium cars are mainly defined by their higher price compared to non-premium cars. Due to the high price, the vehicles are often more comfortable and better equipped with state of the art technology. Typical premium Original Equipment Manufacturer (OEM) are Audi, BMW, Mercedes or Tesla. Conventional cars include all vehicles independent from the brand in a representative population. Therefore representative household travel survey (HTS) are well suited to represent a conventional car usage. In particular premium vehicles demand a higher investment from their owners than conventional vehicles. Does this also mean that these vehicles are used differently? The answer to this question would provide input for car manufacturers how

to design such premium cars and optimize fleet productivity. For example, it can help to integrate new technologies (e.g. range of electro mobility) and to dimension the proportion of the premium fleet for car sharing due to the usage profiles. Dealing with this scientific issue a comparison of premium cars with a conventional fleet is appropriate. In transportation research, this topic can be approached from two data-driven directions: data from travel behavior of person or sensor data of cars. In HTS, the individual travel behavior of people with regard to the car is generally taken into account on an individual level. As a result, the frequency with which people use their vehicles and for which distance is derived. The main problem of these studies arises as they neglect the perspective of the vehicle. Cars in households are often used by several people, especially in multi-person households with only one car. As a result, the observable car usage is a combination of the travel behavior of different people. This clarifies the importance that focusing on the perspective of a car is an adequate method for some scientific questions regarding car use. For example, research on new technologies, such as electro mobility for cars and the replacement of gasolinepowered vehicles by electric vehicles (EVs) should use this perspective.

Corresponding author. E-mail addresses: [email protected] (U. Niklas), [email protected] (S. von Behren), [email protected] (C. Eisenmann), [email protected] (B. Chlond), [email protected] (P. Vortisch). ⁎

https://doi.org/10.1016/j.rtbm.2020.100456 Received 1 November 2019; Received in revised form 23 December 2019; Accepted 14 February 2020 2210-5395/ © 2020 Elsevier Ltd. All rights reserved.

Please cite this article as: Ulrich Niklas, et al., Research in Transportation Business & Management, https://doi.org/10.1016/j.rtbm.2020.100456

Research in Transportation Business & Management xxx (xxxx) xxxx

Niklas, et al.

As an advantage, when we consider the usage at the car level and not on a person level, the information of all users is bundled in the car and its usage profile. With this car perspective, one could determine whether all users of a vehicle are suitable for electro mobility or not. Such an car perspective approach was implemented by Eisenmann (2018). Based on HTS data, car use was modelled from a car perspective for a conventional vehicle fleet in Germany and California over a period of one year. Moreover, the vehicles considered in this study were grouped to eight distinct car usage profiles. Regarding the use of premium vehicles this approach is not appropriate because the sub-sample of premium vehicles is not large enough to be examined in detail. Another approach is the analysis of sensor data from a premium car manufacturer in terms of car usage. The main problem with this approach is the access to such data and the comparability with a conventional car fleet. Therefore, mainly small samples with sensor data are analyzed in the literature. With this, a valuable statement is not possible. In our study, we have the possibility to combine both approaches to compare premium vehicles (from sensor data) with a conventional car fleet (from survey data) in Germany and California. Due to the different data collection methods regarding car usage, we need a framework that is able to compare car usage from both data sources. The eight car usage profiles from Eisenmann and Buehler (2018) provide this framework. The profiles are abstract enough to allow a comparison between both data sources. With the help of this framework, the aim of our study is to compare the relative sizes of car usage profiles between premium cars and conventional cars to answer the following question:

also describes the structure of the paper. First, we will present a comprehensive literature review in which we will discuss the different data sources that are used for the investigation of car usage. Second, we will present the used data sources in this study. As third step, we will describe the implemented framework in more detail and introduce the specific method of machine learning, which we use to assign the premium vehicles to the existing usage profiles. Finally, we analyze the premium vehicles in the usage profiles in relation to the conventional fleet in Germany and California. The last section completes this study with the discussion of the limitations and an outlook on subsequent research tasks. 2. Literature review In the following section we first describe and secondly compare methods to analyze car usage with “survey data” and “sensor data” based on the available literature. We further present some of the most important results of relevant studies which we use to validate our own model at a later stage of this research. There are a large number of publications on the usage of cars by individuals for everyday travel, e.g. Weiß, Chlond, Hilgert, and Vortisch (2016) and Bäumer, Hautzinger, Kuhnimhof, and Pfeiffer (2018). However, these analyses are usually based on HTS and vehicle usage profiles are recorded for individuals and not for a specific car. An example of this is given by Haustein and Nielsen (2016), who used the Eurobarometer survey with 27,000 respondents to analyze the differences in travel behavior between different European countries. They used the k-means algorithm to cluster individuals on the basis of their travel behavior into an interpretable eight-cluster solution. The “busy green drivers” and the “convenience drivers” were estimated as 61.5% of all observed households in West-Germany. Another data source to analyze car usage in the longitudinal perspective are travel surveys over several weeks, such as the travel surveys Mobidrive and Thurgau. Those were conducted in Germany and Switzerland over six weeks in 1999 and 2003 (Axhausen, Zimmermann, Schönfelder, Rindsfüser, & Haupt, 2002; Löchl, 2005). Findings of those surveys indicate that transport mode choice of individuals is fairly stable when carrying out the same activities (e.g. shopping) and is highly stable when frequenting the same destinations (e.g. supermarkets) (Schönfelder, 2006; Susilo & Axhausen, 2014). Bäumer et al. (2018) compare annual car mileages between different German HTS, the German Mobility Panel (MOP), the Mobility in Germany (MiD) survey and the German vehicle mileage survey (Fahrleistungserhebung, FLE), since 2000. They found that annual average car mileages in Germany are quite stable within the last two decades, between 13,000 km and 15,000 km. Eisenmann (2018), who analyzed car usage on a vehicle perspective, differentiates the average monthly mileage in Germany according to the position of the car in the household (e.g. first car in a multi-car household, second car in a multi-car household) based on the MOP. Cars in a multi-car household have an average monthly mileage of around 1500 km. Pasaoglu et al. (2012) analyzed car usage patterns derived from different surveys in six European countries. The average daily mileage in Germany was found to be between 50 km and 60 km depending on the day of the week. Furthermore, the average number of car trips per day was determined to be approximately 2.0 for Germany. In the recent past, sensor data has been increasingly used in research. Here, the vehicle usage can directly be observed from the data on a vehicle perspective. Elango, Guensler, and Ogle (2007) and Pearre, Kempton, Guensler, and Elango (2011) analyzed car usage profiles, which were collected by GPS in the Atlanta metropolitan region over up to 52 weeks in 2004. The sample amounts about 500 cars. Pearre et al. (2011) show an average daily mileage of 32.6 miles (including days without driving). Elango et al. (2007) show that cars tend to be used more variably, when owned by multi-car households, by multi-person households (often with children or students), and by households with higher incomes. Khan and Kockelman (2012) collected data from 419

• Are premium cars used differently to conventional cars?

Based on survey data

Studies on car usage

Based on sensor data

Comparison of survey and sensor data

Sensor data

Survey data

Car usage from premium vehicles

Car usage from conventional vehicles

Data

Literature Review

To answer this question, we will proceed as visualized in Fig. 1. This

Method

Variable selection

Trained model Model fit Labelled data

Framework / Eight car usage profiles

Assigning premium vehicles to car usage profiles

Eight conventional car usage profiles

Results

Eight premium car usage profiles

Comparison of car usage between premium and conventional vehicles

Fig. 1. Paper structure. 2

Research in Transportation Business & Management xxx (xxxx) xxxx

Niklas, et al.

cars from 255 households between November 2004 and April 2006. Based on 269,357 trips records and 143,004 vehicle-day records, the average vehicle miles travelled (VMT) per day was 25.4 miles. Schuessler and Axhausen (2009) enriched raw GPS-data from 4882 participants in Switzerland with further information about the mode choice or purpose of trip. Based on the individual speed and acceleration they collected additional information on trip level. Kullingsjö and Karlsson (2012) provide an overview of existing GPS-based mobility studies. This overview clearly shows that the sample size of GPS-based data is often very small and the analyses are limited to selected cities. Comparisons of sensor data with travel survey data are also existent in literature. Stopher, FitzGerald, and Xu (2007) investigated the precision of the Sydney HTS with an in-vehicle GPS survey of 118 individuals from 70 households. The study achieved trip-level data collection by using the same households to compare individual trip records from the HTS with the corresponding trip records from the GPS survey. It was shown that the average Vehicle Kilometers Travelled (VKT) recorded by the HTS was 8.85 km per trip, whereas the average VKT recorded by the GPS was 8.07 km. This indicates an average over-reporting of 0.78 km per trip by HTS respondents. Furthermore, 7.4% of trips were not reported by the HTS participants at all. However, this may be due to individuals conflating several short trips into longer ones. Wolf, Oliveira, and Thompson (2003) investigated the impact of under-reporting on modelled VMT. They compared estimates of VMT derived using traditional travel survey results from three regions in California with GPS data from the same households. Based on the data, the study concluded that a higher amount of trips were not reported during off-peak periods. However, no significant patterns of under- or over-reporting between the travel survey and sensor data results could be recognized. In San Diego the average modelled VMT based on CATI (Computer-Assisted Telephone Interviewing) is 6.9. Compared to this, the averaged GPS distance is 5.8. In Alameda the relation is the other way around and in Sacramento the average VMT based on CATI is equal to the GPS distance. Auer, Bogenberger, Rehborn, Koller, and Palmer (2017) analyzed individual travel behavior characteristics from a car perspective and compared this with the outcome of different HTS. Due to data limitations they only showed general key figures, for example the average driving time, average distance or average speed for the vehicle fleet. Individual travel behavior over a longer period of time and specific car usage profiles could not be determined. Kelly, Krenn, Titze, Stopher, and Foster (2013) provide a review of studies comparing self-reported and GPS-measured journey durations. All reviewed studies concluded that self-reported journey times were over-estimated. Data collection was commonly conducted over one day, although some studies had an observation period of up to one week. A limitation of this review was that only the differences in driving times between GPS and survey data were examined. Differences in driving distances were not considered. The main limitation of studying car usage patterns using sensor data is related to their general availability. Most sensor data based analyses are often too specific to reflect a whole market. Further the sample size is often small and data depends on specific cities and car types. Due to this, data mainly is not representative. The main reason can be traced back to the expensive process of collecting adequate data. A further problem is the lack of information regarding the purpose of a trip (Bricka, Sen, Paleti, & Bhat, 2012; Eisenmann & Buehler, 2018; Elango et al., 2007; Kelly et al., 2013; Wolf et al., 2003). On the contrary, HTS can lack the required accuracy. Due to self-reporting some trips do not appear in the available dataset. Responds tend to forget the reporting of some trips or rate them as unimportant (Bricka et al., 2012). Data from traditional household surveys is not suitable for an analysis of car usage profiles, as they only collect usage information from a single individual, whereas in reality a car can be used differently by different individuals in the household. Furthermore, the observation period is often limited to a maximum of one week (Kullingsjö & Karlsson, 2012). The model by Chlond, Weiss, Heilig, and Vortisch (2014) and

Eisenmann (2018) addresses this issue. For their model they used survey data to illustrate individual usage of different household members. In a further step, Eisenmann and Buehler (2018) clustered the modelled car usage into eight profiles. However, due to the small sample size of premium vehicles it is difficult to emphasize the differences in usage according to premium and conventional cars. Nevertheless, their used work results is an appropriate data base for comparing sensor data with survey data, because it considers mobility from a vehicle perspective and it is possible to derive the attributes of the eight car usage profiles from sensor data. Using the sensor data with a large sample size in Germany and California enables us to extend the framework of Eisenmann and Buehler (2018). Following an initial idea from Schuessler and Axhausen (2009), the aim of this study is to identify differences and similarities between the usage patterns of premium and conventional cars. 3. Data For this study we used two data sources: One source is the dataset called CUMILE (Car Usage Model Integrating Long-distance). It is a model based on survey data that represents car use over one year for conventional car fleets in Germany and California. The second source is a sensor based dataset of premium cars collected from a German premium Original Equipment Manufacturer. This set was collected over a timespan of one month in Germany and California. Both datasets (survey and sensor data) include private and commercial used cars. 3.1. Survey data Survey data contain 1659 conventional vehicles with 1,207,115 trips in Germany, and 1681 conventional vehicles with 2,084,508 trips in California. CUMILE was initially developed for Germany (CUMILEGER) using the MOP and INVERMO, a survey on LD trips, as input data. The MOP is a German national HTS which consists of the two parts “Everyday Mobility” (MOP-EM) and “Fuel Consumption and Odometer Reading” (MOP-FCOR):

• For the MOP-EM, 1000–1500 households report their daily travel



patterns over a period of one week. In addition, data on the sociodemographic characteristics of households (e.g. number of cars per household, net household income) and household members (e.g. sex, age, driving license ownership) are also collected (Ecke et al., 2018; Zumkeller, 2009). For the MOP-FCOR, car-owning households within the MOP-EM survey are asked to report dates, total car mileage, and fuel consumption for each refueling during eight weeks in spring. Car related data, such as fuel type and engine power, was also collected (Ecke et al., 2018).

In the survey INVERMO approximately 10,000 survey participants in Germany were asked to report detailed information on their last three LD trips, including distance, modes used, day of departure, day of arrival, and trip purpose (Zumkeller, Chlond, Last, & Manz, 2006). The survey was conducted from 2000 to 2002. The CUMILE-GER algorithm models car usage over one year by determining the following 4 data points: 1. Individual car travel data of survey participants during the MOP-EM week. 2. Car usage for typical days of the year. 3. Car usage per day during the period of the MOP-FCOR survey. 4. Car usage per day for the remaining days of the year. For comparison of car usage between Germany and California, Eisenmann and Buehler (2018) modelled the car usage in California in CUMILE-CAL. The modeling concept of CUMILE-CAL is very similar to 3

Research in Transportation Business & Management xxx (xxxx) xxxx

Niklas, et al.

data. In total, 0.17% of the observed vehicles with 4394 trips were excluded using the criteria described above. With the aim to demonstrate the feasibility of assigning premium vehicles to the car usage profiles of Eisenmann and Buehler (2018), vehicle sensor data was collected over a period of one month (May 2018) in a proof of concept study. In total, data from 74,568 premium vehicles with 5,380,763 individual trips in Germany and California with credible odometer readings was available for this study. In Germany we analyzed 45,590 vehicles with 2,950,967 trips. In California we analyzed 28,978 vehicles with 2,429,796 trips.

Table 1 Sensor data: Data structure. Variable name

Description

Data format

Vehicle Anonymized Number (VAN) Session ID Daystamp Trigger Odometer Seat occupancy

Unique car ID

String

Unique trip ID Day of a Trip Pre-defined set of events Driven kilometers of the vehicle People, who are fasten their seatbelts during the trip

String yyyy-MM-dd Categorical Integer Categorical

3.3. Data comparison the original model CUMILE-GER. Eisenmann and Buehler (2018) used California Household Travel Survey (CHTS) data from 2010 to 2012. The CHTS data included GPS and On-Board Diagnostic (OBD) car trip surveys of one week, as well as LD travel surveys.

In order to better understand the data, we compared key figures, like VKT and trips per vehicle with the results of other existing studies mentioned in the literature review. Bäumer et al. (2018) observed an annual mileage between 13,000 km and 15,000 km. If we assume a linear relationship between monthly and annual mileage, then our data shows an average annual mileage of 17,676 km per year (1473 km per month). This higher average is likely to be due to the fact that the dataset includes younger premium cars. Eisenmann (2018) identified an average monthly mileage of around 1500 km. This corresponds to the monthly mileage observed in our study for new premium cars. Pasaoglu et al. (2012) studies monitored an average daily mileage in Germany between 50 km and 60 km depending on the day of the week. The average daily mileage with 61.3 km of our used dataset of premium vehicles is slightly above the range. The “premium factor” may explain this high value. Furthermore, the average number of car trips per day in Germany was determined to be approximately 2.0, which aligns with our estimation of 2.2 trips per day. In summary, the data of this study is consistent with the available data of previous investigations, as no serious deviations from the considered studies could be detected. The higher values regarding driving distances can be explained by the younger age and premium factor of the observed vehicles.

3.2. Sensor data Sensor data, which are available for new premium vehicles in this study, is odometer based and transmitted automatically without any action from car users. After the vehicle has covered about 150 km, car users were asked if they agree to the data collection. This agreement did not entail any financial advantage. The underlying sensor data are displayed on several frontends. Therefore, the technology transmits the actual status of the vehicle to a backend system. The interaction between the sensor data and the backend system is event based. The telematics control unit starts collecting all the relevant data via controller area network and diagnosis jobs. When the limit of 500 collected events or 5 min is reached the data package is transmitted to the backend system. From the collected data, we filtered the relevant trip information and transformed the sensor data into following data structure as shown in Table 1. The variables VAN and session ID allows us to match trips to a specific vehicle. With this information we can determine the car usage within the observation period. Sensor data is triggered by a pre-defined set of events (trigger). Based on the triggers we are able to define the beginning and the ending of a trip. We used the odometer variables of the previous and the actual trigger in order to calculate the distance of a single trip by calculating the difference. A significant advantage of sensor data is that a set of information such as functional usage statistics is automatically collected. In our study, we used the variable seat occupancy, which describes how many people fastened their seat belts during the trip. If the seatbelt is fastened once during a trip, a seat occupancy is considered. If a person does not fasten their seatbelt during the trip, the passenger is not recognized. Seat occupancy has four variants: 1_0: only the driver's seat is occupied, 1_X: driver's seat and at least one rear seat are occupied, 2_0: driver and passenger seat occupied, 2_X: driver and passenger seat and at least one rear seat are occupied. After selecting relevant data out of the comprehensive information of the sensor data, the next step is the validation of the remaining sensor data. This is an important process to remove incorrect data points. Data must be cleaned, treated and visualized to ensure reliability and representativeness. In this study we implemented the following criteria to validate the data. First, in order to ensure that no new vehicles were added during the observation period, vehicle movement data must have already been transmitted before the observation period starts. Second, we excluded all vehicles with “implausible” movement data. This was possible due to the volume of data. All vehicles which show extreme outliers regarding distances covered during the observation period, were removed from the dataset (average trip lengths over 200 km). The reason for this outliers could be an error in the data transfer or the inclusion of cars, which are used to transport passengers. These vehicles and their car usage patterns are not in the focus of our research questions, and thus we have excluded them from our study

4. Method In this section we present the used framework of eight car usage profiles and their characteristics. Afterwards, we describe the selection process of the algorithm used, the formal implementation and the validation of the algorithm. 4.1. Framework Using the car usage data of CUMILE-GER and CUMILE-CAL, Eisenmann and Buehler (2018) applied a hierarchical cluster analysis to identify and interpret different car usage profiles. To form the clusters, the study took both the car use intensity and the car use variability into account. The authors unsupervised clustered the cars based on the following four car usage characteristics (input variables): 1. The number of days per year without car use. 2. The number of days per year with daily mileage between 1 and 10 miles. 3. The number of days per year with daily mileage of more than 100 miles. 4. The proportion of work days versus non-work days for car use. This clustering approach resulted in eight different car usage profiles: 1. Standing cars: Not used on average for 256 days per year 2. Moderate-range cars: Used quite regularly during the year, both on weekdays and weekends 4

Research in Transportation Business & Management xxx (xxxx) xxxx

Niklas, et al.

3. Day-to-day cars: Used on most days of the year (on average 36 days per year without use) 4. Workday cars: Used almost entirely on workdays (on average 96% of days with car use are workdays) 5. Weekend cruisers: Disproportionally used on weekends 6. Long-distance (LD) cars: Most widely used on long-distance journeys 7. Short-haul cars: Cover distances between 1 and 10 miles per day on 201 days a year on average 8. Allrounders: Midfield in all categories, used for clustering

the decision to include VKT as additional variable in the model in order to better differentiate the usage of new premium vehicles. However, based on the fact that the deviation between quadratic and cubic SVM is quite small and to avoid over-fitting of the data, this study selected to use quadratic SVM. Table 2 Model selection.

A detailed description of the profiles is given in the results by Eisenmann and Buehler (2018). We now use these eight car usage profiles as a framework for our comparison. This means that we want to assign the premium cars from the sensor data to these profiles to compare them with the conventional car fleet. The profiles were used as labels to apply an allocation model for the premium vehicles from sensor data. For this process, we have to convert the sensor data into the same four input variables as described above. In addition to the four input variables, we have summarized the VKT from each day to determine the total driven distance for every vehicle. Therefore, the model is trained on five input variables. In the following, we describe the whole process of model assignment in more detail.

Method

Accuracy

Linear Discriminant Analysis Quadratic Discriminant Analysis Linear Support-Vector-Machine Quadratic Support-Vector-Machine Cubic Support-Vector-Machine Fine Tree Medium Tree Coarse Tree

89.1% 90.5% 91.5% 95.5% 96.3% 92.6% 94.0% 61.2%

The basic theory of SVM comprises on finding optimal classification hyperplanes. These hyperplanes maximize the margin between the clusters. An advantage of SVM compared to other cluster algorithms is the possibility to create such hyperplanes based on a relatively small subset of data near the hyperplanes. In other words, the algorithm is robust to outliers which is relevant by using different datasets. (James et al., 2017). In this study, SVM includes the labeled data from the framework to create an optimal classification hyperplane (Ding, Zhu, & Zhang, 2017). As a first step, we trained the model with the conventional vehicles (observation period over one year) and then predicted the assignment of premium vehicles (observation period over one month). For this, we used a z-transformation to standardize the clusterforming variables. On this basis, we were able to cluster the premium (x X¯ ) vehicles with z-values. In general, the z-score (z = S ) measures the deviation from data points from the mean X in terms of the standard deviation S. The standardized dataset has mean 0 and standard deviation 1. Z-transformed values retain to the shape properties of the original dataset and have the same skewness and kurtosis. The SVM algorithm was originally designed for binary classification problems; k = 2 classes. To predict k > 2 classes and to avoid unclassifiable regions, we implemented the error correcting output code (ECOC). This reduces our multiclass problem to a series of binary problems. Each row of the Coding Matrix M ∈ {1, −1, 0} describes a car usage profile, each column corresponds to a binary learner. The number of hyperplanes is defined by k (k 1) . In our case, the size of the coding 2 matrix is 8 × 28. For this study, we needed to train 28 binary classification sub-problems instead of eight multiple classification problems. For more details on the ECOC approach, we recommend Dietterich and Bakiri (1995). After defining each class with a binary ECOC-Matrix, we used the one-vs-one approach. This method compares all classes individually with each other. The resulting label of a test set observation is the label that was most frequently assigned to this observation (Abe, 2010; James et al., 2017). The model input is given by the labeled dataset of conventional 5 and vehicles C (l) = { (x i (l ) , yi (l) ) } n with x i (l) = (x1;i (l) ,…, x5; i (l) )

4.2. Assignment of premium vehicles in car usage profiles As a first step and in order to take into account the observed / identified higher values regarding driving distances of premium vehicles, VKT are also considered in the classification process. This serves for better comparability with the sensor data. As VKT is not a modelled value from CUMILE but a statement of the respondents in the survey, the use of the VKT brings a further advantage. By considering VKT, the classification model determines the influence of VKT based on the assignment of Eisenmann and Buehler (2018). Consequently, cars with a higher mileage can be allocated to the corresponding clusters. This considers the special nature of new premium vehicles and attempts to counteract for any possible data bias. As a summary, for the characterization of the car usage profiles we now use the following five clustering-forming variables: 1. The number of days per month without car use, 2. The number of days per month with daily mileage between 1 and 10 km, 3. The number of days per month with daily mileage of more than 100 km, 4. The proportion of work days versus non-work days for car use and 5. VKT during the observation period For this reason, it is not possible to reproduce the Eisenmann and Buehler (2018) clustering model by using the same cluster method (hierarchical clustering and k-means) and by assigning premium vehicles based on centroids. Instead, we have selected an alternative allocation algorithm as the main objective and for the comparison of premium vehicles with the conventional market. We used a method of machine learning to assign premium vehicles to car usage profiles. To apply the most suitable algorithm for the present dataset, we tested different supervised methods to train our model and assessed them on their accuracy. Accuracy is defined by the fraction of predictions the model got correct. The results are shown in Table 2. All linear methods have a lower accuracy compared to the quadratic Support Vector Machine (SVM). Only cubic SVM has a better accuracy with 96.3%. Often it is better to implement a hyperplane which does not perfectly separate all data into classes to ensure a greater robustness of individual observations (James, Witten, Hastie, & Tibshirani, 2017). The allocation method used by Eisenmann and Buehler (2018) brings only a small advantage. By modeling with five variables we achieve an accuracy of greater than 95%. This confirms

i=1

yi (l) = (y1; i (l) ,…, ys;i (l) ) {1, 2,…, 8} . The aim of our algorithm is to construct a rule yi (l) = f (xi (l) ) ∈ ℝs → {1, 2, …, 8} such that, given a new d x j (u ) sample we can predict its labels; i.e. ( u ) ( u ) yj = f (xj ) {1, 2,…, 8} to generate a labeled dataset of premium vehicles P (u) = { (x j (u) , yj (u) ) } n i = 1.

x i (l) is a feature vector of the four above described variables and the additional VMT variable for the ith conventional car. yi (l) includes the usage profile for the ith conventional car. xj (u) is a feature vector of the five cluster-forming variables based for the jth premium car. 5

Research in Transportation Business & Management xxx (xxxx) xxxx

Niklas, et al.

Standing cars

1

Moderate-range cars

2

Day-to-day cars

3

Workday cars

4

Weekend cruisers LD cars Short-haul cars

7

All-rounders

8

2

3

97%

1%

<1%

96%

94%

4

5

6

4%

2%

1%

1%

2%

<1%

1%

4%

<1%

5

2%

1%

<1%

94%

2%

6

<1%

2%

<1%

1%

93%

1%

<1%

1%

1%

1% 1%

99% <1%

greater for conventional vehicles than for premium vehicles. This is likely to be because premium vehicles mainly include young cars and conventional vehicles also includes older ones. The relative days, where the car is used over 100 km, is higher for premium cars than for conventional cars. On the one hand, this could be due to the premium factor and, on the other hand, many conventional cars are not used for LD travel (e.g. second cars in a household or old and small cars). In addition, it may also be possible that the modelled LD travel in CUMILE-GER slightly underestimates the actual LD travel. This is confirmed by similar LD travel data in CUMILE-CAL, which is calculated from non-modelled CHTS data. The same picture shows up in the comparison of the monthly distances covered (premium vehicles in California: 1523 km; conventional vehicles in California: 1572 km). The proportion of car usage on workdays is similar for both data sources.

8

<1%

95%

2% <1%

7

2% 96%

Predicted classes

Positive Predictive Value False Discovery Rate

94% 6%

97% 3%

96% 4%

95% 5%

94% 6%

93% 7%

99% 1%

96% 4%

Table 3 Relative means of car usage characteristics.

Fig. 2. Positive predictive value / false discovery rate.

includes the predicted usage profile for the jth premium car. To enlarge our feature space d in order to establish a non-linear boundary for more flexibility between the classes, we implemented a quadratic Kernel K(xi, xi′) = (〈xi, xi + d〉)2 (James et al., 2017). To validate the calculated hyperplanes f (xi (l) ) , we implemented a confusion matrix. This matrix contains the positive predictive values in a confusion matrix as a measure of accuracy. In this matrix, the rows represent the true classes of the data and the columns the predicted classes. The diagonal describes the well-classified examples while other data indicates confusions. In Fig. 2 we present the absolute positive and negative predicted values over all classes. The lowest accuracy in the generated predicted values are in class 6 (LD cars) with 93%. This class has also the smallest cluster size. The highest accuracy is in class 7 (short haul cars) with 99% positive predicted values. We assume that the clusters of conventional cars { (xi (l) , yi (l) ) } 3,340 are

Vehicle type

Country

Relative days car not used

Relative days car used between 1 and 10 km

Relative days car used over 100 km

Share of work days with car use

Monthly car mileage

Premium vehicles Conventional vehicles

Germany California Germany California

26% 20% 39% 34%

7% 7% 17% 11%

17% 14% 7% 13%

77% 77% 79% 75%

1804 1523 1002 1572

yj (u)

Fig. 3 shows the distribution of VKT for premium and conventional cars. In general, the distributions of both data sources have similar curves. They both have a lognormal distributions which reflects expected typical car usage. Differences in data exist for cars that covered longer travel distances within the observation period. The distribution of conventional cars converges much faster to 100%, compared with the premium cars. This difference is likely to be due to the premium factor, and the possible underestimation of LD travel in CUMILE-GER.

i=1

independent and identically distributed. With the help of quadratic SVM, we can now assign the premium vehicles to car usage profiles and generate the labeled sensor dataset P (u) = { (xj (u) , yj (u) ) } 74,568 . Concerning

()

j =1

VKT (in %)

8 = 28 the one-vs-one approach, we construct SVM. To evaluate the 2 prediction of yi (l) = f (xi (l) ) and to protect it against overfitting, we implemented a 5-fold-cross-validation. 5-fold-cross-validation is a statistical technique, which divides the data into five equal subsets. We then trained the model on one data subset and used the other subsets to evaluate the model's performance. The algorithm was built by using MATLAB. 5. Results Based on the allocation algorithm, premium cars were allocated to different car usage profiles. In the following section we present comprehensible results. To guarantee the general comparability of the two data sources, we first will compare their five cluster-forming variables by using mean values. Second, we will plot the distances covered over an equal period of time (one month). In the next step, we will describe the car usage characteristics for each profile and compare the relative cluster sizes of premium vehicles with those of the conventional vehicles. Finally, we will use additional information from the premium vehicles and describe the average distribution of seat occupancy among the usage profiles. Table 3 presents the relative means of car usage characteristics by vehicle type and country. To generate comparable values, we divided the total number of days by the observation period. The relative number of days on which the car was not been used was found to be

16.00%

100.00%

12.00%

75.00%

8.00%

50.00%

4.00%

25.00%

0.00%

Cumulative VKT (in %)

1

True classes

Car usage Profiles

0.00% 0

1,000

2,000

3,000

4,000

Premium cars: VKT

Conventional cars: VKT

Premium cars: Cumulative VKT

Conventional cars: Cumulative VKT

Fig. 3. Distribution of VKT between premium and conventional cars (monthly).

The car usage profiles from premium cars and their characteristics are shown in Table 4.

• The standing cars cluster includes vehicles, which have not been moved for nearly the half of the observed days. In general, when

6

Research in Transportation Business & Management xxx (xxxx) xxxx

Niklas, et al.

• • • •

these cars are moved, they are mainly used on workdays and for LD trips (average distance per trip: 36 km, monthly mileage: 1485 km). The moderate-range cars cluster describes vehicles with belowaverage use. They have no major outliers in their usage behavior. Day-to-day cars are used nearly every day. This cluster has the lowest average in days where the car is not used. Workday cars and weekend cruisers are cars which are mainly used on weekdays and on weekend respectively. LD cars and short-haul cars are determined based on trip lengths. LD cars cover distances of over 100 km on an average of 13 days out of 30 observed days. They have the highest average trip length (31 km) and record the most trips in the observation period (average number of trips: 98).

transport is very well developed, especially in LD transport between two big cities (e.g. Munich to Stuttgart). Due to that, individuals replace their older and not so comfortable car with a train, especially at longer distances. The difference can also be explained by the generally higher proportion of commercial used premium cars in Germany. The higher quality of public transport in Germany explains also the higher share of day-to-day cars (GER: 12%, CAL: 18%) in California. Especially in urban areas individuals can use other means of transport (e.g. public transport) to cover daily trips. Workday cars have a bigger share in Germany (14%) than in California (8%). This could be explained due to the big share of business cars in Germany. Conventional business cars can often only be used for work purposes. Premium business cars and the associated higher occupational status may also be for both, business and private purposes. In general, new premium cars are similar in usage, regardless of their spatial context. In the life cycle of a car, vehicles migrate to certain specialized clusters, especially in Germany. The link between vehicle age and car usage, and the related similar use of the observed new premium vehicles, is another argument for the extension of the study dimensions to include VKT during the observation period. This enables us to more clearly separate premium vehicles and their use.

The use of these car usage profiles has the potential to lead to an overlap of certain profiles, for example weekend cruiser, LD cars and short-haul car. By expanding the study dimensions with the inclusion of VKT, these profiles can be distinguished more precisely. In our study, the weekend cruiser (average distance: 29 km) covers short distances compared to LD cars (average distance: 35 km), and short-haul cars mostly were used for short trips (average distance: 16 km).

Table 4 Premium cars: Usage characteristics of the different cluster in Germany and California. Cluster name

Standing cars Moderate-range cars Day-to-day cars Workday cars Weekend cruisers LD cars Short-haul cars All-rounders Mean usage

No car use at a whole day

Daily mileage below 10 km

Daily mileage of more than 100 km

Share of workdays

Monthly mileage [kilometers]

Mean

(StDev)

Mean

(StDev)

Mean

(StDev)

Mean

(StDev)

Mean

(StDev)

14 4 2 10 14 3 6 8 7

(5) (3) (2) (4) (4) (3) (4) (3) (5)

1 1 3 2 2 1 9 4 2

(1) (1) (1) (2) (1) (1) (2) (1) (2)

5 3 4 4 3 13 2 4 5

(4) (2) (3) (3) (3) (4) (2) (4) (5)

88% 77% 72% 91% 68% 75% 77% 78% 77%

(8%) (5%) (3%) (4%) (12%) (6%) (7%) (6%) (9%)

1485 1599 1716 1451 1070 3291 942 1386 1695

(1034) (631) (795) (758) (744) (1179) (662) (949) (1088)

Table 5 compares the sizes of the car clusters between premium and conventional cars in Germany and California. Table 5 Comparison of car cluster sizes in Germany and California. Cluster name

Standing car Moderate-range cars Day-to-day cars Workday cars Weekend cruisers LD cars Short-haul cars All-rounders Total sample

Germany

California

Premium cars Frequency

Premium cars Share

Conventional cars Share

Premium cars Frequency

Premium cars Share

Conventional cars Share

5187 12,971 3865 2182 6315 6354 2057 6659 45,590

11% 28% 8% 5% 14% 14% 5% 15% 100%

16% 17% 12% 14% 15% 3% 9% 14% 100%

2842 10,142 2227 953 3737 4010 1224 3843 28,978

10% 35% 8% 3% 13% 14% 4% 13% 100%

13% 22% 18% 8% 16% 14% 3% 6% 100%

The comparison of conventional cars between Germany and California reveal some differences. The conventional LD cars (GER: 3%, CAL: 14%) are highly dominated in California. In Germany the rail

For example, short-haul cars (GER: 5%, CAL: 4%), LD cars (GER: 14%, CAL: 14%) and day-to-day cars (GER: 8%, CAL: 8%) have a similar cluster sizes for new premium cars in Germany and California. This

7

20%

100%

15%

75%

10%

50%

5%

25%

0% 15

30

45

60

75

GER: Average Distance GER: Cumulative Average Distance

Trips (in %)

75%

4%

50%

2%

25%

0% 25

50

75

100

125

150

175

200

Number of trips per Vehicle GER: Trips GER: Cumulative Trips

Fig. 5. Premium cars: Distribution of the number of trips per vehicle for Germany and California.

The proportion of standing cars (GER: 11%, CAL: 10%) for premium vehicles is lower than for conventional vehicles (GER: 16%, CAL: 13%). However, it is high in the content that we only have considered mostly new premium vehicles. The reason for this could be that new premium vehicles are often used in households with more than one vehicle. This could spread the need for vehicle mobility in the household among the available vehicles. In turn, this would reduce the use of the observed vehicle. Moderate-range cars for premium cars (GER: 28%, CAL: 35%) recorded a larger share in California. This could result from the fact that California doesn't have a well-developed network for public transit compared with Germany. As a result, Californians use their vehicles in an extended radius for various purposes (e.g. work, shopping). Further, conventional cars could be used only for specific purposes, if they are the second car in a household. This would explain the lower relative sizes of conventional cars (GER: 17%, CAL: 22%). This hypothesis is consistent with the cluster size of premium workday cars (GER: 5%, CAL: 3%). The share in this cluster is very small, because premium cars are used for different purposes and not exclusively for workday trips compared to conventional cars (GER: 14%, CAL: 8%). The use of weekend cruisers (GER: 14%, CAL: 13%) for premium vehicles are similar to the whole market. LD cars and all-rounder cars (GER: 15%, CAL: 13%) reveal premium typical car usage behavior. As a result of the higher comfort of premium vehicles, these cars are used disproportionately for longer distances. This is reflected by the share of LD cars as well. The classification according to vehicle usage profiles from the conventional market forms a solid basis for carrying out detailed analyses with the comprehensive information that is available in the used sensor data. For example, we can use the seat occupancy information and the classification of the premium vehicles in the car usage profiles to see how the occupancy rate differs across the usage profiles. Fig. 6 represents the average seat occupancy over all observed premium cars and their assignment to a car usage profile. The sensor data made it possible to monitor the seat occupancy for each trip of every vehicle. The vehicles were then grouped according to their

90

CAL: Cumulative Average Distance

6%

CAL: Trips CAL: Cumulative Trips

Average Distance (in km) CAL: Average Distance

100%

0

0% 0

8%

0%

Cumulative Average Distance (in %)

Average Distance (in %)

illustrates the similar car usage of new premiums cars, regardless of their spatial structure. However, if we take a closer look at the distribution of the VKT (see Fig. 4) and the number of trips (see Fig. 5), differences in premium vehicle use can be identified. Fig. 4 shows a lognormal distribution of the average distance travelled per trip of premium cars for Germany and California. Compared with the results from Pasaoglu et al. (2012), the VKT per trip is consistent. The average distance of trips in Germany and California is 25.7 km. When taking a closer look at the spatial context, differences for the distribution of the distance per trip reveal. While trips in Germany cover an average of 30 km per trip, trips in California only covers 19 km. However on a daily basis, vehicles in California cover longer distances. This finding is complemented by the distribution of the number of trips that is shown in Fig. 5. German vehicles record an average of 67.7 trips in the observed month compared with 93.9 trips in California. This difference is likely to be due to different spatial structures. In Germany it is not necessary to use a vehicle for all purposes (e.g. work, leisure, shopping), as access to public transportation is more developed than in California. This results in a lower number of trips per day in Germany than in California, and simultaneously the length of individual trips is longer. Another explanation for the differences could be that Californians are more car-orientated in their attitude than Germans. Based on sociodemographic information, Hildebrand (2003) used a hierarchical method to define the optimal number of clusters. With a given number of clusters, he used the k-means algorithm to assign the individuals into clusters. No cluster is identified where a car is not used for the majority of trips. On the other side, the cluster analysis of Haustein and Nielsen (2016) identified non-car-orientated European mobility styles like green public transport users and green cyclist. This is also confirmed by von Behren et al. (2018), who analyzed urban areas in San Francisco, Berlin and Shanghai and identified a higher orientation to cars in San Francisco compared to Berlin.

Cumulative Trips (in %)

Research in Transportation Business & Management xxx (xxxx) xxxx

Niklas, et al.

Fig. 4. Premium cars: Distribution of the average distance travelled per trip for Germany and California.

8

Research in Transportation Business & Management xxx (xxxx) xxxx

Niklas, et al.

assigned car usage profile and the average seat occupancy for each usage profile was calculated. 71% of the trips of premium workday cars were made only by the driver. Many of these trips could serve the purpose of commuting. In California, only 68.2% trips of premium workday cars were made only by the driver. This lower rate could be explained by implemented high-occupancy vehicle lane in the United States. Furthermore, workday cars show the lowest proportion of trips with driver and passenger (18.6%). The exact opposite can be observed for weekend cruiser. The weekend cruiser has the highest share of trips with driver and passenger (29.6%) and the lowest share of trips made only by the driver (57.6%). This could be explained by possible leisure trip purposes, which are more likely to be made in pairs and on weekend. Day-to-day cars have the highest proportion of trips where at least one back seat was occupied (15% = 5.1% + 9.9%). Here, the car could fulfil the purpose of chauffeuring individuals. The composition of seat occupancy differs only slightly between LD cars and short-haul cars. This could imply that the occupancy rate of a vehicle is more determined by purpose of a trip and less by distance. The provided data of the premium cars and their information about seat occupancy prove the intuition of car usage profile. It further confirms that the allocation and that the assignment of premium vehicles to the eight car usage profiles is solid.

100%

7.0%

5.8%

8.5%

9.9%

23.9%

25.8%

9.7%

9.0%

7.6%

8.4%

24.2%

25.2%

24.5%

3.7%

5.1%

4.4%

63.2%

62.1%

62.7%

• •



The presented benefits of the results have to be seen in contrast to several limitations of using sensor data to analyze the car usage patterns.

90% 80%

21.2%

70%

3.8%

18.6%

4.3%

60%

4.6%

29.6%

5.1%

• First, due to privacy reasons it is not possible to distinguish private

3.1%

50% 40% 30%

68.1%

63.3%

71.0% 59.2%

57.6%



20% 10% 0% Standing car Moderate-range Day-to-day cars Workday cars cars

1_0

1_X

Weekend cruisers

2_0

LD cars

(2018): driving range 449 km,1 VW ID.3 (2020): driving range 550 km, Polestar 2 (2020): driving range 500 km). If premium OEMs do not exceed these ranges, non-premium EVs could catch up in the premium market and acquire additional customers. Premium car sharing provider, should offer their cars in areas with an extended radius due to the larger driving distances compared to a conventional car sharing provider. A higher willingness to pay could also explain the willingness to use premium cars very broadly and not for specific purposes. Perhaps the premium factor is the universal usability of a vehicle for any purposes (LD travel, storage space, comfort). This is in line with existing literature from von Behren, Bönisch, Niklas, & Chlond, 2020, which showed higher car use with a premium car in a household. An analysis of seat occupancy reveals that seat occupancy is determined by driving purpose rather than distance. Regarding the present ecological discussion, car sharing provider and the responsible government could create incentives to increase the seat occupancy rate. Rental price could be based on seat occupancy, the more people in the vehicle, the lower the costs. An additional incentive would be the expansion of high occupancy vehicle (HOV) lanes. The positive effects have already been examined by various studies (Delhomme & Gheorghiu, 2016; Guensler et al., 2019).

Short-haul cars All-rounders

2_X



Fig. 6. Premium cars: Seat occupancy by usage profile.

6. Discussion and further research This study used a machine learning technique, called quadratic SVM, to combine the usage of premium vehicles with the usage of conventional vehicles to identify the premium factor in the usage. For this, two different datasets (survey and sensor data) on vehicle usage were clustered into eight car usage profiles. On this basis, we were able to compare the relative sizes of the car usage profiles, and identify premium-specific car usage characteristics, such a stronger use in LD travel. Using the seat occupancy information from premium cars, we furthermore could identify differences along the car usage profiles. The information helps us to adapt the requirements for alternative drive technologies or mobility services in the premium sector.

cars from commercial or company cars. The anonymizing of the Vehicle Identification Number (VIN) to a Vehicle Anonymized Number (VAN) results in the unavailability of information regarding the car owner or his socio-demographic information. Second, the technology used to generate the data depends on a high quality navigation system installed in the vehicle. Information on cars without these navigation systems is not available. As a result, there is a potential for bias in car usage data, for example, if cars equipped with additional items, like high quality navigation systems, are used more frequently. Third, the available data only allow an identification of the country in which the vehicle is located. A more detailed division of the activity space of a vehicle (urban and rural areas) could determine the differences in car usage between the countries in greater detail. This also confirmed by Haustein and Siren (2015). They analyzed the mobility of older people and identified that car-dependent seniors are overrepresented in rural areas.

To give a deeper understanding of the differences in car use between premium and conventional vehicles, the study offers space for further research proposals:

• Sensor data

was used with no additional trip information (e.g. purpose of a trip). Further research could expand sensor data with additional information from car use to determine for which purpose people actually use their cars, and how this differs by vehicle type and country. Therefore, it is no longer sufficient to define the car usage profile on a vehicle level. To analyze the dependency between car usage profiles and the purposes of trips, each trip of every vehicle must be complemented with individual information of the user. For this purpose, recurrent car usage patterns between individual trips (e.g. distance and time) and purpose of a trip has to be learned. As a result of this process, the identified driving patterns

• Premium cars are used more intensively (low proportion of standing

cars) and for longer distances (high proportion of LD cars). For this reason, the premium factor for EVs could be stronger determined by their driving range. For example, Tesla's Model S and Model X with a driving range of 450 km in winter and 350 km in summer already provide EVs which is used disproportionately for LD trips compared to EVs from other brands (Figenbaum & Nordbakke, 2019; Nicholas, Tal, & Turrentine, 2017). In near future, non-premium OEMs will launch EVs with an acceptable range (e.g. Hyundai Kona Electric

1 Electric driving ranges calculated with Worldwide harmonized Light vehicles Test Procedure (WLTP).

9

Research in Transportation Business & Management xxx (xxxx) xxxx

Niklas, et al.





can then be used to transfer the information on trip purpose to the objective sensor data. Data points such as car age and number of car users were not available. With the help of this information the differences in car use relating on the life cycle of a vehicle could be analyzed. At the start of life, many premium cars are often used as company cars in Germany. After a few years, these vehicles are offered to the private market. This leads to a change in ownership and a corresponding fundamental change in car usage. An analysis of the different car usage patterns based on car age and market location provides an opportunity for future research. Higher seat occupancy rates can reduce traffic volumes and jams. Analyzing the effect of HOV lanes on the seat occupancy rate would give new insights on how people change their travel behavior when they can reduce their driving time or costs.

differences in survey-reported and GPS-recorded trips. Transportation Research Part C: Emerging Technologies, 21(1), 67–88. Chlond, B., Weiss, C., Heilig, M., & Vortisch, P. (2014). Hybrid Modeling approach of Car uses in Germany on basis of empirical data with different granularities. Transportation Research Record: Journal of the Transportation Research Board, 2412(1), 67–74. Delhomme, P., & Gheorghiu, A. (2016). Comparing French carpoolers and non-carpoolers: Which factors contribute the most to carpooling? Transportation Research Part D: Transport and Environment, 42, 1–15. Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 2, 263–286. Ding, S., Zhu, Z., & Zhang, X. (2017). An overview on semi-supervised support vector machine. Neural Computing and Applications, 28(5), 969–978. Ecke, L., Chlond, B., Magdolen, M., Eisenmann, C., Hilgert, T., & Vortisch, P. (2018). Deutsches Mobilitätspanel (MOP) - Wissenschaftliche Begleitung und Auswertungen Bericht 2017/2018: Alltagsmobilität und Fahrleistung. Eisenmann, C. (2018). Mikroskopische Abbildung von Pkw-Nutzungsprofilen im Längsschnitt: Karlsruhe. Eisenmann, C., & Buehler, R. (2018). Are cars used differently in Germany than in California? Findings from annual car-use profiles. Journal of Transport Geography, 69(C), 171–180. Elango, V. V., Guensler, R., & Ogle, J. (2007). Day-to-day travel variability in the commute Atlanta, Georgia, study. Transportation Research Record: Journal of the Transportation Research Board, 2014(1), 39–49. Figenbaum, E., & Nordbakke, S. (2019). Battery electric vehicle user experiences in Norway’s maturing market. Guensler, R., Ko, J., Kim, D., Khoeini, S., Sheikh, A., & Xu, Y. (2019). Factors affecting Atlanta commuters’ high occupancy toll lane and carpool choices. International Journal of Sustainable Transportation, 1–12. Haustein, S., & Nielsen, T. A. S. (2016). European mobility cultures: A survey-based cluster analysis across 28 European countries. Journal of Transport Geography, 54, 173–180. Haustein, S., & Siren, A. (2015). Older people's mobility: Segments, factors, trends. Transport Reviews, 35(4), 466–487. Hildebrand, E. D. (2003). Dimensions in elderly travel behaviour: A simplified activitybased model using lifestyle clusters. Transportation, 30(3), 285–306. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2017). An introduction to statistical learning: With applications in R (Corrected at 8th printing). Springer texts in statistics. New York, Heidelberg, Dordrecht, London: Springer. Kelly, P., Krenn, P., Titze, S., Stopher, P., & Foster, C. (2013). Quantifying the difference between self-reported and global positioning systems-measured journey durations: A systematic review. Transport Reviews, 33(4), 443–459. Khan, M., & Kockelman, K. M. (2012). Predicting the market potential of plug-in electric vehicles using multiday GPS data. Energy Policy, 46, 225–233. Kullingsjö, L.-H., & Karlsson, S. (2012). The Swedish car movement data project. Belgium: Proceedings to EEVC Brussels19–22. Löchl, M. (2005). Stability of travel behaviour: Thurgau 2003. Travel Survey Metadata Series, 16. Nicholas, M. A., Tal, G., & Turrentine, T. S. (2017). Advanced plug-in electric vehicle travel and charging behavior interim report. CA: Davis. Pasaoglu, G., Fiorello, D., Martino, A., Scarcella, G., Alemanno, A., Zubaryeva, C., & Thiel, C. (2012). Driving and parking patterns of European car drivers: A mobility survey. EUR (Luxembourg. Online). 25627. Luxembourg: Publications Office. Pearre, N. S., Kempton, W., Guensler, R. L., & Elango, V. V. (2011). Electric vehicles: How much range is required for a day’s driving? Transportation Research Part C: Emerging Technologies, 19(6), 1171–1184. Schönfelder, S. (2006). Urban rhythms: Modelling the rhythms of individual travel behaviour. (ETH Zurich). Schuessler, N., & Axhausen, K. W. (2009). Processing raw data from global positioning systems without additional information. Transportation Research Record: Journal of the Transportation Research Board, 2105(1), 28–36. Stopher, P., FitzGerald, C., & Xu, M. (2007). Assessing the accuracy of the Sydney household travel survey with GPS. Transportation, 34(6), 723–741. Susilo, Y. O., & Axhausen, K. W. (2014). Repetitions in individual daily activity–travel–location patterns: A study using the Herfindahl–Hirschman index. Transportation, 41(5), 995–1011. Weiß, C., Chlond, B., Hilgert, T., & Vortisch, P. (2016). Deutsches Mobilitätspanel (MOP): Wissenschaftliche Begleitung und Auswertungen Bericht 2015/2016: Alltagsmobilität und Fahrleistung: Karlsruhe. Wolf, J., Oliveira, M., & Thompson, M. (2003). Impact of underreporting on mileage and travel time estimates: Results from global positioning system-enhanced household travel survey. Transportation Research Record: Journal of the Transportation Research Board, 1854(1), 189–198. Zumkeller, D. (2009). The dynamics of change - 15 Years german mobility panel (Washington, D.C.). TRB 88th Annual Meeting Compendium of Papers, 2009. Zumkeller, D., Chlond, B., Last, J., & Manz, W. (2006). Long-distance travel in a longitudinal perspective: The INVERMO approach in Germany. TRB 85th Annual Meeting Compendium of Papers (Washington, D.C.), 2006.

In summary, initial question of how the usage of premium cars differs from conventional vehicles could be answered with the help of the framework of Eisenmann and Buehler (2018) and the resulting comparison of the relative car usage profile sizes. We identified a stronger importance in LD traffic for premium vehicles. In addition, premium cars are used more intensively for various purposes in general. This is also confirmed due to the fact that the relative sizes of premium car usage profiles differ only slightly between California and Germany. Furthermore, we could analyze information of seat occupancy of premium cars among the car usage profiles. Here, it was found that seat occupancy is more influenced by the purpose of a trip than by the distance travelled. This study supports a better understanding of the market needs and premium car fleet productivity to the operational challenges for new mobility services and alternative propulsion technologies, such as car sharing and electric mobility. Declaration of Competing Interest None. Acknowledgments This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. We would like to thank the anonymous reviewers for their suggestions and comments. The remarks helped us considerably to improve our paper. References Abe, S. (2010). Support vector machines for pattern classification. Advances in pattern recognition(2. ed.). London: Springer-Verlag London Limited. Auer, M., Bogenberger, K., Rehborn, H., Koller, M., & Palmer, J. (2017). Mobilitätskennwerte für den motorisierten Individualverkehr aus Flottendaten. Straßenverkehrstechnik, 61, 87–93. Axhausen, K. W., Zimmermann, A., Schönfelder, S., Rindsfüser, G., & Haupt, T. (2002). Observing the rhythms of daily life: A six-week travel diary. Transportation, 29(2), 95–124. Bäumer, M., Hautzinger, H., Kuhnimhof, T., & Pfeiffer, M. (2018). The German vehicle mileage survey 2014: Striking the balance between methodological innovation and continuity. Transportation Research Procedia, 32, 329–338. von Behren, S., Bönisch, L., Niklas, U., & Chlond, B. (2020). Linking individuals' affective and instrumental motives to their car use – An application of an integrated choice and latent variable model. 99th Annual Meeting of the Transportation Research Board. von Behren, S., Minster, C., Esch, J., Hunecke, M., Vortisch, P., & Chlond, B. (2018). Assessing car dependence: Development of a comprehensive survey approach based on the concept of a travel skeleton. Transportation Research Procedia, 32, 607–616. Bricka, S. G., Sen, S., Paleti, R., & Bhat, C. R. (2012). An analysis of the factors influencing

10