Available online at www.sciencedirect.com Available online at www.sciencedirect.com
ScienceDirect ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2016) 000–000 Procedia Computer Science 00 (2016) 000–000
ScienceDirect
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
Procedia Computer Science 109C (2017) 905–910
The 6th International Workshop on Agent-based Mobility, Traffic and Transportation Models, The 6th InternationalMethodologies Workshop on Agent-based Mobility, Traffic 2017) and Transportation Models, and Applications (ABMTrans Methodologies and Applications (ABMTrans 2017)
An Simulation of of Air Air Travel Travel Itinerary Itinerary Choice Choice An Agent-Based Agent-Based Simulation Roger A Parker, PhD* Roger A Parker, PhD*
AirMarkets Corporation, 20222 23rd Place NW, Shoreline, WA 98177 USA AirMarkets Corporation, 20222 23rd Place NW, Shoreline, WA 98177 USA
Abstract Abstract
The AirMarkets SimulatorTM utilizes an agent-based model of individual air travel passengers to represent and hence explore the TM The AirMarkets Simulator an agent-based of individual air travel passengers represent and hence explore the the commercial air travel system.utilizes The ABM is built onmodel a discrete choice random utility model, to and simultaneously replicates commercial air travel system. The ABM is built on a discrete choice random utility model, and simultaneously replicates the behavior of every passenger travelling in every air market in the world. The discrete choice decision model is described in detail, behavior ofhow every passengerthe travelling every air market in travel the world. TheSample discreteresults choiceare decision is described in detail, along with it portrays dynamicsinof air travel and air pricing. offered,model including an illustration of along with how it portrays the dynamics of air travel and air travel pricing. Sample results are offered, including an illustration of the empirical limits of a random market behavior such as air travel. the empirical limits of a random market behavior such as air travel. © 2016 Roger A. Parker, PhD. © 2016 Roger A. Parker, PhD.
Published by©Elsevier B.VAuthors. . 1877-0509 2017 The Published by Elsevier B.V. Published byunder Elsevier B.V. Peer-review responsibility of the Program Chairs.Chairs. Peer-review under responsibility of Conference the Conference Program Peer-review under responsibility of the Conference Program Chairs.
Keywords: Air Itinerary Choice Modeling, OD Demand Simulation, Global Air Travel Modeling, Air Passenger Agents, Inherent Demand Variation Keywords: Air Itinerary Choice Modeling, OD Demand Simulation, Global Air Travel Modeling, Air Passenger Agents, Inherent Demand Variation
1. Introduction 1. Introduction The AirMarkets Agent-Based Model Air Travel Simulator (the Simulation) is an agent-based model of the world-wide The AirMarkets Agent-Based Air ofTravel Simulator27 (the Simulation) is an agent-based model of theone world-wide commercial air travel market. ItModel consists approximately million passenger agents (pags), representing or more commercialpassengers air travel traveling market. together It consists 27 million agents (pags), representing one orairline more individual (forofa approximately total of approximately 42 passenger million passengers), a little over a thousand individual passengers traveling together (for a total of approximately 42 million passengers), a little over a thousand airline revenue and scheduling agents (arasags), which represent the scheduled air service throughout the world, along with service revenue and (arasags), which represent air service the world, with entities service provided by scheduling charter andagents on-demand companies, and overthe 160scheduled distribution system throughout agents (dsags), whichalong represent provided by charter and on-demand companies, and over 160 distribution system agents (dsags), which represent entities engaged in providing information on available air travel itineraries from arasags to pags. This paper will focus its attention to engaged in providing information the modelled behaviour of pags. on available air travel itineraries from arasags to pags. This paper will focus its attention to theThe modelled pags.of the Simulation represents the behaviour of the individual agents during the ticketing process for overall behaviour conceptualoflogic The overall conceptual logic of the Simulation the behaviour of the individual agents ticketing process for available flights. A single scheduled aircraft canrepresents carry passengers traveling in many markets, andduring so the the decision by one pag has flights. onAthe single scheduled aircraft can carry passengers traveling many and so theThis decision onedynamic pag has aavailable direct impact actions of subsequent passengers that may wish toinuse themarkets, same airplane. is thebykey a direct impact on the actions of subsequent passengers that may wish use the same of airplane. This is the key dynamic underlying the economy of commercial air travel, and its representation is atocentral purpose the Simulation. In addition, there underlying the economy of commercial air travel, and its representation is a central purpose of the Simulation. In addition, there is no other way of portraying the inherent variability of air travel behaviour, so the agent approach is justified a fortiori. is The no other of portraying inherent of air travel behaviour, so the agent approach is justified a fortiori. travelway is modelled for athe week’s time,variability since in general the world’s airline schedule repeats that often. The current schedule travel is modelled for afrom week’s time,Airline since inGuide general the world’s airline schedule repeats that often. The current schedule ofThe flights is readily available Official (OAG). Each pag in the synthetic population (or synpop) is assigned an of flights readily available Official Airline Guideorigin-destination (OAG). Each pag in the synthetic population synpop) is assigned(see an origin andisdestination city forfrom its trip, based on global demand distribution matrices(or previously prepared origin and destination city for its trip, based on global origin-destination demand distribution matrices previously prepared (see *Corresponding Author. Tel: +1-206-949-2059; Fax: None *Corresponding Author. Tel: +1-206-949-2059; Fax: None Email address:
[email protected] Email address:
[email protected] 1877-0509 © 2016 Roger A. Parker, PhD 1877-0509 © Elsevier 2016 Roger Published by B.V.A. Parker, PhD Published byunder Elsevier B.V. Peer-review responsibility of the Conference Program Chairs. Peer-review under responsibility of the Conference Program Chairs.
1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs. 10.1016/j.procs.2017.05.419
906
2
Roger A Parker / Procedia Computer Science 109C (2017) 905–910
Roger A. Parker/ Procedia Computer Science 00 (2015) 000–000
Carson et. al.1). It is then assigned a booking instant (when the tickets are purchased) using the probability distribution function for air travel bookings (see Parker2). Ticketing begins 127 days before the last departure day of a standard week. Pag processing then consists of the pags requesting ticket availability from dsags, in booking instant order, from which it receives a list of available flight options. It then chooses one, if it can, and purchases the ticket, and the ticket is then no longer available for subsequently booking pags. As the booking process proceeds, pags which have already purchased tickets are subjected to a stochastic cancellation process at the agent level, so the seats purchased may be returned to the available inventory. On a periodic basis (by default once a day, but can be changed), the arasags execute a revenue management update, which can result in ticket pricing changes. When all pags have been processed, the simulation execution is complete and the results are produced. Primary results of the simulation are estimates of load and revenue, by directional city-pair market, for all individual aircraft traveling anywhere in the world These results can be aggregated as desired for additional exploration and analysis. In addition, the configuration of available air service – routing, timing, air fare, connections, etc. – can be altered at will, and the results compared to current operations. Most importantly, the simulation replicates the inherent variation found in air travel behaviour, and does so using logic only available in an agent-based model. The Simulator coordinates and manages the actions of the agents using messaging architecture. Basically, all the agents in the simulation communicate with one another by means of passing messages through a message queue. Messages addressed to other agents are posted there, and agents read messages from the queues when they have time available. This allows agents to operate semi-independently of one another while maintaining coordinated action. This process of selection and booking is managed in the Simulator by employing the messaging features of Microsoft’s .NET platform. This service allows independent processing threads (possibly on different computers) to communicate with one another by passing messages through the queue maintained by a designated processor.
2. The Pag Itinerary Choice Protocol Central to the valid operation of the Simulator is the pag itinerary choice. The choice model is a mixed logit with random coefficients model. Each pag in the simulation has four itinerary choice utility functions associated with it. The four versions arise due to the combination of trip purpose (business or leisure) and travel time sensitivity (arrival or departure). They all are of the same form, only differing in the empirical coefficients. For example, the fare coefficient is generally more negative for leisure than business travel for the same pag, reflecting the higher sensitivity to fare of an individual compared to a business. In addition, depending on its journey structure, a pag can be arrival time sensitive or departure time sensitive. The general form of the utility function for the pag choice protocol is as follows. Define an origin-destination market (OD market) to be any directional city pair with a well-defined origin city and destination city. Consider any market m for which air travel demand is not zero, and denote the set of available itinerary fare classes in m by (m). A fare-class is defined to be a set of tickets aboard a specific aircraft being offered at the same price. It is usually the case that a specific scheduled aircraft will contain at least three fare-classes, and often considerably more. In the general random utility formulation, it is asserted that the pag, denoted by i, has before it a finite set of choices for each of which that agent has determined a real number called a utility. The utility U(i,j) for pag i and that alternative itinerary j is modelled as the sum of an observed component V(i,j) and an unobserved component (i,j). That is, U (i, j ) V (i, j ) (i, j ).
(1)
The term V(i,j) is termed observable because experiments can be conducted which allow the estimation of the utility as a function of attributes of the itinerary alternative fare class and characteristics of the agent. The unobserved term is, however, a random variable, and represents that aspect of the decision process that cannot be observed. It is assumed here, in keeping with the underlying random utility conceptualization, that (i,j) is independently and identically distributed as an Extreme Value Type 1 (EV1) distribution. (See McFadden3, Ben-Akiva and Lerman4, Train5 and Louvierre, Hensher and Swait6 for more thorough discussions of random utility theory.) Then, for pag i, the utility V (i, j ) for that pag and for itinerary fare class j (m) is defined by the following utility equation: V (i, j ) f (i ) ln f ( j ) d ( j )[ d (i ) bd (i ) ln d base ] dc (i ) N dc ( j ) ic (i ) N ic ( j ) 1st (i ) X 1st ( j ) ec (i ) X ec ( j ) G ( (i) t ( j )) I ( a)[ (i, a) F (i) F (i, a)]
(2)
a
The Greek letters in this equation are empirical coefficients which represent the importance of the associated variable with respect to itinerary choice. How they are generated is explained later in this discussion, following the presentation of the ideal time representation. The first seven independent variables in the equation above are these: • f(j) is the fare (in 2015 US dollars) of the itinerary fare-class j, and ln f(j) is the natural log of that fare. Natural log is a better fit, reflecting the fact that the impact of a $100 fare increase on a base fare of $1000 is perceived differently than on a base fare of $100. • d(j) is the duration of itinerary fare-class j.
Roger A Parker / Procedia Computer Science 109C (2017) 905–910
907
Roger A. Parker/ Procedia Computer Science 00 (2015) 000–000
3
dbase is the base (shortest) duration of all the itineraries in the set m). The base duration arises in the formulation because the pag is considered to compare a given itinerary with the best available itinerary, which, ceteris paribus, is the alternative with the shortest travel time. • Ndc(j) is the number of direct connections (often called online connections) between aircraft of the same airline or airlines in the same alliance in itinerary j. • Nic(j) is the number of indirect (or interline, between aircraft of the different airlines) connections in itinerary j. Indirect connections are considered less convenient, and hence have lower utility, than direct connections. • X1st(j) is a dummy variable equal to one if the itinerary fare class uses the first-class cabin on the aircraft, zero otherwise. • Xec(j) is a dummy variable equal to one if the fare class uses the main cabin Xec(j) on the aircraft and zero otherwise. If both X1st(j) and Xec(j) are zero, then the business class cabin is assumed (the business cabin is the reference value for the indicator variables X1st(j) and Xec(j)). Recall that a pag can be either departure or arrival time sensitive. Each time utility function has the same form, but different parameters. The function G((i) – t(j)) defines the time-of-day utility structure, and is given by a pair of Box-Cox transformations, one for early times and one for late times, which surround an interval of time about the ideal time called the indifference window, within which the pag doesn’t have any time-related disutility. Specifically, the function G is defined as •
G (t ( j ) (i ) a 1)E 1 E (i ) E G ( (i ) t ( j )) 0 L LG (i ) ( (i ) t ( j ) b 1) 1 L
(i ) t ( j ) a a (i ) t ( j ) b
(3)
(i ) t ( j ) b
In this equation, the following quantities are used: • i) is the ideal (departure/arrival) time for pag i. • t(j) the departure time of itinerary j. • E and L are empirically derived parameters characterizing the Box-Cox representation of the disutility curves for early itinerary departure/arrival times, respectively. • a and b are the bounds of indifference window within which the pag is indifferent to the itinerary departure/arrival time. This specification thus stipulates that the disutility of not departing at the desired departure time (or, alternatively, arriving at the desired time) differs depending on if the actual departure/arrival time is before the desired (early) or after the desired (late). The disutility curve is therefore shaped like that shown in Figure 1. These curves are fitted with a general-purpose Box-Cox transformation from empirical data, which describe the available data reasonably well. The indifference window (the flat line between a and b) reflects the fact that to some degree a pag doesn’t care if a flight is early or late. This expression and its derivation and estimation is described in detail in Parker and Walker7. The final terms in the utility shown in Equation 2 depend explicitly on which airline is operating itinerary j. Let be the set of all airlines operating in the market in question, and let the function I(A,j) be an indicator variable which is one if A operates itinerary j, and zero otherwise. The term F(i,A) represents the frequent flyer mileage pag i has with airline A. There is a good deal of research yet to be done on the relationship between a passenger’s itinerary preference and the airline operating the itinerary. Carriers have mixed opinions on the importance of cabin attendee attitude, on-time performance, cabin cleanliness, TIME OF DAY Departure time of flight 1 = t1 MORNING
Ideal departure time =
a
b
G(t1)=G(t1 – ) G(t2)=G(t2 – ) DISUTILITY
Figure 1: The ideal departure/arrival time disutility curve.
Departure time of flight 2 = t2 EVENING
Roger A Parker / Procedia Computer Science 109C (2017) 905–910
908
Roger A. Parker/ Procedia Computer Science 00 (2015) 000–000
4
and other features of the flight experience directly under the control of the carrier. Incorporating such measures into a discrete choice model is also difficult, although item response theory through Rasch scaling (Fox and Bond 8, and von Davier and Carstensen9) holds great promise in this regard. In addition, the inclusion of these two variables equation make the utility function adaptive, in the sense that passenger agents with more experience with a specific carrier behave differently than those with less experience. Adaptive agents are rarely encountered in the agent-based model world, but are quite clearly appropriate here. The implications of this adaptivity to the passenger choice model are substantially significant. The various (i)’s in Equations 2 and 3 are empirical coefficients for disutility assigned to each individual pag in the synpop. These coefficients reflect the values assigned to the individual attributes of the itineraries by the specific pag i. Collectively this set of empirical parameters are referred to as agent characteristics. The specific characteristic values associated with a given pag are determined by generating random numbers using a probability distribution that has been estimated from analysis of the incidence of each potential characteristic value found in the traveling population. How this was done, including the details of the research conducted to estimate the parameters of the relevant probability distributions, is discussed in Parker10. All the parameters except those associated with the G function have a lognormal distribution. The parameters in the G function have normal distributions. Each pag has four sets of each of these parameters, one for each of the four travel circumstance – business or leisure travel and departure or arrival sensitivity. Under this utility function structure, the probabilities associated with the utility functions are of mixed logit form, with the mixing distribution being the distribution of ideal departure/arrival times in the population of that particular market. That is, the probability of choosing j from the set of available itinerary fare classes is given by the following:
pi ( j )
W
eV (i , j| ( i )) d ( ), 0 eV (i ,k | (i )) k ( m )
(4)
where [0, W] is the week time interval and ( ) is the distribution of ideal departure/arrival times in the population over the period [0, W]. The distribution function ( ) is estimated from empirical data. In an agent-based representation of the travel market, the mixture probability pi ( j ) is easily computed. If S(m) is the set of passengers traveling in market m, (the OD demand for market m), and #(S(m)) the number of elements in S(m), then
pi ( j )
W
eV (i , j| (i )) 1 eV (i , j| (i )) . d ( ) V ( i , k | ( i )) V ( i , k | ( i )) 0 e #( S ( m )) e ( ) i S m k ( m ) k ( m )
(5)
The pag uses its discrete choice model to compute the probability of selecting each of the offered itineraries. A random number generator is then used to select an itinerary based on the probabilities determined in Equation 5. The options are put in some arbitrary order, and the probability distribution function of the available choices created. This is a discrete probability distribution, so it consists of jumps equal to the probability of each choice. Then a random number generator produces a uniformly distributed random number between zero and one, and the choice corresponding to that number found by searching the cumulative discrete distribution function. Well over 20 million choice decisions occur in each simulation run, so the rando m number generators must be very strong. The random numbers for the simulation are created using a Marsenne Twister approach, and the algorithms used for the number generation are modelled after those developed by Troscheutz11. The itinerary choice discussion above has been couched in terms of alternative flights or sequences of flights that connect a given origin-destination market. But that is not exactly the choice being made by the pag. In fact, the pag is selecting a cabin and fare-class on board a flight. For example, a US domestic flight might offer five different fares on the same physical flight. There might be two different offerings in the first-class cabin, one which is unrestricted and another which does not allow the fare to be refunded if the ticket is not used, only applied to another flight (with a change fee deducted). In the main cabin, there might be three different options, such as a full fare option which is fully refundable if not used, a slightly less expensive fare class which only allows the cost of an unused ticket to be applied to another flight, and a third, very low fare for which there is no refund if cancelled and must be purchased at least 30 days in advance. It is the practice in the industry to offer many fares onboard the same flight, which differ not only in cost to the passenger but also in refund conditions, advance purchase options, or other subtle features. Recent practice in the US, for example, is to charge extra for checked luggage and on-board refreshment in less expensive fare classes. The itinerary choice options then are different fare classes in addition to different flight itineraries. Nothing else in the analysis changes.
3. The Simulator Operation The AirMarkets Simulator currently operates on an eight-processor Intel chip with 16 Gb of available memory. This has been found to be the most efficient system architecture. Multiple processors on separate computers have proven less effective, since the time required to move data between them becomes an inhibiting factor. The simulation is written in C# and runs under the
Roger A Parker / Procedia Computer Science 109C (2017) 905–910 Roger A. Parker/ Procedia Computer Science 00 (2015) 000–000
909
5
Windows 10 .NET operating system. The pags are stored in the order in which they are to be booked. When the pag is created, it is assigned a booking time. This time is in tenths of a second before the pag’s specified departure date and time, up to 127 days prior to the final minute of the simulated week. This time limit has been chosen since there is no empirical evidence of any scheduled flight fare-class anywhere in the world being fully sold earlier then that time. The execution of the simulation consists of loading each pag from its disk storage into main memory in the booking time order. After loading, the pag is assigned to one of several hundred available parallel processing modules or threads (called Backgroundworkers in C#). In that module, the available itineraries for the pag’s journey are assembled by the appropriate dsag from the arasags that serve the pag’s market, the probability of each is computed using the discrete choice model that represents the trip purpose and time dependency, and the selection is made using the random number generator. Once selected, the revenue from the booking is accrued to the airline operating the flights in the selected itinerary, and the seats are removed from the aircraft’s inventory so they are no longer available to pags which book later. While seats are being considered for selection by a pag, they are not available for inclusion in the itinerary set of another pag until they are released. Also at that time, since the stochastic cancellation pattern of passengers is known, it is determined if and when the seats will be cancelled later in the simulation. As the simulation proceeds, the end of each simulated booking day sees a review of all bookings to date, and if cancellation is called for, it is executed. The cancelled seats are returned to the specified aircraft inventory. Parallel processing threads are used since the CPU time needed to serve the requirements of a pag varies from pag to pag. It depends on the market that is being used, the extent of then-available itineraries, and the activity of other pags that may be offered flights that are part of other pag’s choice sets. In the current Simulator version, approximately 27 million pags representing 42 million travellers are processed using 400 parallel threads with a total running time of about 22 minutes. This allows a 480-repetition Monte Carlo simulation to run in a little over a week if a single computer is used. The example shown below used four computers, requiring about two days of execution time.
4. Example Simulation Results The results of the agent-based model simulation run contain the bookings of each of the over twenty-seven million pags on the several hundred thousand flights found worldwide in a week. Usually these results are aggregated to represent some specific aspect of the air travel system of interest. For example, a tabular result can be produced by the Simulator for a specific market (e. g. Seattle, WA USA to Miami, FL USA) with detail down to an individual flight, showing the number of tickets and revenue sold in each fare-class on each flight in every itinerary serving that market. An airline serving that market can then adjust the departure time for a specific flight, or the fare-class fare structure, or the capacity of the aircraft, and see the likely resulting revenue impact. Another result example is shown in Figure 2. The curve is a representation of the underlying random effects caused by the discrete choice protocol resulting from the inherent variation in passenger itinerary choice. This is a graph of 480 simulation iterations of the demand for a specific air service in a range of markets. The mean demand, as a proportion of the maximum possible market demand, is indicated by the heavy green vertical line. In addition, the thin green lines show the left and right 95% confidence interval values, also as a proportion of maximum demand. This is thus a picture of the inherent variation that can be expected in estimating air travel market demand under any circumstances.
Figure 2: The confidence interval for air travel OD demand estimates (480 iterations).
5. Conclusions The AirMarkets Simulator model ties together passenger characteristics (the value of money, the value of time, comfort and familiarity) with air travel itinerary attributes (fare, flight duration, stops, departure/arrival times, cabin, and airline engagement). The result is a highly-detailed representation of the travel on individual aircraft flights throughout the world for a week of
910
6
Roger A Parker / Procedia Computer Science 109C (2017) 905–910
Roger A. Parker/ Procedia Computer Science 00 (2015) 000–000
scheduled travel, incorporating the behavior of over 27 million passenger agents representing more than 42 million individual passengers. Figure 2 also illustrates one of the essential results that is only available because of the agent-based architecture of the AirMarkets Simulator. There is no other way to estimate the inherent variability of travel demand, due to the interdependency of future flight options based on the prior selections or earlier agents. And this inherent variability is one of the key phenomena necessary for the optimal definition of air travel service. It also is a worthy example of the value of agent-based modelling. Without the Simulator, it would be essentially impossible to evaluate a proposed change in air service without actually instituting the service modifications and measuring the result with realized revenue. Typically, a commercial air flight costs over $100,000, so even one experimental flight exceeds the cost of using an agent-based model literally thousands of times. (The AirMarkets Simulator takes about a third of an hour to execute on a typical 8-processor desktop computer.) The choice model leads the way to further research in many directions. Most important is an extension of the representation of adaptivity suggested by the airline preference and use terms in the current model. Current AirMarkets research, for example, is directed toward incorporating on-demand air travel, such as charter service, into the system, and one of the initial investigations is directed to understanding the awareness – or lack of it – among the broad set of air travelers that the choice protocol suggests would find that option very desirable.
References 1. Carson, R., R. Parker, and T. Cenesizoglu (2011), “Forecasting Aggregate Demand for US Commercial Air Travel”, International Journal of Forecasting, 27:923-941. 2. Parker, R. (2010), Virtual Markets: The Application of Agent-Based Modelling to Marketing Science, PhD Dissertation, University of Technology Sydney, Sydney, Australia. 3. McFadden, D. (1980) “Econometric Models of Probabilistic Choice Among Products”, J. of Business 53: 513-529. 4. Ben-Akiva, M and Lerman, S. (1985), Discrete Choice Theory, MIT Press. 5. Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge University Press. 6. Louviere, J., P. Hensher and J. Swait (2000), Stated Choice Methods, Cambridge University Press. 7. Parker, R. and J. Walker (2007), Determining the Passenger Value of Departure and Arrival Times Using a Mixed Logit Model, AGIFORS Scheduling and Strategic Planning Study Group, Tokyo, Japan. 8. Fox, C. and T. Bond (2007), Applying the Rasch Model: Fundamental Measurement in the Human Sciences, 2nded, Lawrence Erlbaum A. 9. von Davier, M. and C. Carstensen, (2007), Multivariate and Mixture Distribution Rasch Models, Springer. 10. Parker, R. (2010) op. cit. pp 56-05. 11. Troscheutz, S. (2007), .NET Random Number Generators and Distributions, http://www.codeproject.com/ KB/recipes/Random.aspx.