www.elsevier.com/locate/atoures
Annals of Tourism Research, Vol. 29, No. 2, pp. 320–337, 2002 2002 Elsevier Science Ltd. All rights reserved. Printed in Great Britain 0160-7383/02/$22.00
PII: S0160-7383(01)00065-2
VACATION BEHAVIOR USING A SEQUENCE ALIGNMENT METHOD Bertine Bargeman Tilburg University, The Netherlands Chang-Hyeon Joh Harry Timmermans Eindhoven University of Technology, The Netherlands Abstract: The classification and analysis of tourists continues to be an important research theme. Existing typologies are typically based on cross-sectional information of tourists’ choice behavior. Consequently, the sequential data embedded in vacation histories is not explicitly considered when developing a typology of vacation behavior. The purpose of this paper is to suggest the use of sequence alignment methods to derive such a typology, incorporating the embedded sequential information. The quintessence of sequence alignment methods is explained and the results of an application to Dutch vacation history data are reported. Keywords: sequence alignment, decision-making processes, typology, vacation behavior. 2002 Elsevier Science Ltd. All rights reserved. Re´sume´: Le comportement de choix de vacances analyse´ par une me´thode d’alignement de se´quence. La classification et l’analyse des touristes demeure un domaine de recherche important. Les typologies existantes sont ge´ne´ralement base´es sur des donne´es transversales concernant le comportement de choix des touristes. Par conse´quent, l’information se´quentielle incluse dans l’historique des vacances n’est pas explicitement prise en conside´ration dans le de´veloppement d’une typologie du comportement des vacanciers. L’objectif de cet article est donc de sugge´rer l’utilisation des me´thodes d’alignement de se´quence de manie`re a` incorporer l’information se´quentielle ou temporelle qui est comprise dans l’historique de vacances. La quintessence des me´thodes d’alignement de se´quence est pre´sente´e et les re´sultats d’une application a` des donne´es ne´erlandaises d’historique de vacances sont rapporte´es. Mots-cle´s: alignement de se´quence, processus de prise de de´cision, typologie, comportement de choix de vacances. 2002 Elsevier Science Ltd. All rights reserved.
INTRODUCTION Faced with a rapidly increasing complexity in tourism products and competition, researchers have long seen the need to classify tourists. Such typologies offer managers a better understanding of the interrelationships between their products and services, those of competitors, and tourist demand, allowing better or at least improved informed
Bertine Bargeman and Chang-Hyeon Joh are at the Tilburg University and the Eindhoven University of Technology, respectively; the former is a specialist on tourist decision-making processes and the latter a specialist on the analysis of activity patterns. Harry Timmermans (Urban Planning Group, PO Box 513, 5600 MB Eindhoven, The Netherlands. Email ) has published over 100 refereed articles on modeling and computer systems in tourism, recreation, retailing, transportation, and other fields of application. 320
BARGEMAN, JOH AND TIMMERMANS
321
strategic and tactical marketing decisions. An examination of the literature indicates that typologies can be based on various kinds of information. First, there are several sociological, phenomenological, and other nonempirical typologies based on theoretical notions. Perhaps the best known of these is Cohen’s typology, which is based on the degree tourists seek novelty or familiarity (Cohen 1972, 1979). As suggested by Mo, Howard and Havitz (1993), empirical testing of such typologies has been rare (Plog 1990, 1991; Polovitz Nickerson and Ellis 1991; Smith 1990a, 1990b; Snepenger 1987). Further, it seems that these typologies are often not very relevant for decision-making, as the implied types are often difficult to identify. In contrast, empirical typologies often lack a theoretical orientation. A simple, especially in the early days, frequently used approach relies on past behavior as expressed by a single variable such as distance traveled (Etzel and Woodside 1982), amount of expenditures (Spotts and Mahoney 1991), frequency of travel (Woodside, Cook and Mindak 1987), activity choice (Hsieh, O’Leary and Morrison 1992; Madrigal and Kahle 1994), or destination (Fodness and Milner 1992; Lang, O’Leary and Morrison 1997; Willenborg and Woodside 1976). Benefit segmentation employs psychographics rather than observed behavior (Cha, McCleary and Uysal 1995; Crask 1981; Mazanec 1984; Loker and Perdue 1992; Shoemaker 1994; Thrane 1997). Attempts to segment a particular market by traveler or demographic characteristics have also been very popular (Andereck and Caldwell 1994; Hsu and Sung 1997; Javalgi, Thomas and Rao 1992; Mudambi and Baum 1997). Yet others (Lang and O’Leary 1997) have advocated a multi-segmentation approach based on motivational, participation, and preference dimensions. This brief literature review suggests that perhaps the dominant approach to building a typology is to group consumers into segments such that the results are homogeneous in respect to their response to marketing mix variables or in terms of their vacation behavior. Typically, some multivariate statistical analysis technique such as cluster, factor, discriminant or multidimensional scaling is used to find the required segments, using cross-sectional data. Consequently, existing typologies do not incorporate any information on vacation behavior over time. Notions of loyalty or variety-seeking behavior are rarely included in the analysis, although some literature examines and compares the behavior of first-time and repeat tourists to destinations (Fakeye and Crompton 1991; Ghyte and Phelps 1989; Gitelson and Crompton 1984; Opperman 1996, 1997). It does not appear that typologies derived from panel data, incorporating sequential information, yet exist. One can only speculate why existing typologies do not incorporate such information. Obviously, information about concepts such as loyalty and vacation patterns over time demand a very rich set of panel data that few analysts are likely to gather. The costs and labor involved are such that a major sponsor or combination of small sponsors is required to collect panel data. Another reason might be the lack of awareness of an appropriate statistical technique. In the majority of
322
VACATION BEHAVIOR
previous studies, representative segments are typically identified by calculating a similarity measure between tourists. Similarities are derived from a set of quantitative and/or qualitative variables, using a Euclidean or generalized distance measure. These measures, however, do not capture any sequential differences between vacation history patterns of tourists. The measures compare the relevant vacation patterns in a position-based manner, but this comparison does not involve differences in the attribute orders between patterns. Moreover, these similarity indices are insensitive to any differences in length between vacation patterns. The present paper focuses on the possibility of using a sequence alignment method to derive a typology of vacation behavior. The sequence alignment method was originally introduced in disciplines such as molecular biology, chromatography, and speech recognition. It has the interesting feature that it employs biological distance rather than geometric (Euclidean) distance as the basic concept of comparison. The measure captures the sequential difference among strings of information, and hence constitutes a potentially viable approach for measuring similarity among vacation history patterns. Unlike conventional similarity measures, it can also cope with patterns of different length. There is always some risk of adopting techniques developed in another field to tourism analysis. But sequence alignment methods represent an interesting way of including differences in sequence structures (or differences in vacation history patterns) into a similarity measure that can be used to classify tourists. Unlike other constructs, this allows one to base typologies/segmentations on information about loyalty, variety-seeking behavior, repeat patterns over time (within and between years), and the like, providing more information for marketing strategies. VACATION BEHAVIOR A vacation pattern or history describes the behavior of any particular tourist over time. In its simplest form, it describes the timing and duration of the vacation over some given time horizon. More complex forms might in addition contain information about the destination (such as domestic vs. international), the type of accommodation, the transport mode used, and more. Vacation patterns reflect the outcomes of very complex decisionmaking processes of individuals and households, who are part of a social system with its typical norms, routines, habits, culture, institutions, and the like. Social systems are characterized by production and reproduction mechanisms as reflected in daily activity patterns. Vacations are part of such patterns. Dependent upon the time spent on other activities and institutional constraints, such as the number of these days, individuals and households face the problem how to allocate the total number of these days over time and space. This allocation task is influenced by both positive and negative factors. People’s desires, motivations, degree of commitment, and involvement will dictate the general latent demand for vacation. Various kinds of con-
BARGEMAN, JOH AND TIMMERMANS
323
straints (financial, social, personal) will limit the possibilities for pursuing or actually implementing the desires. For the purpose of deriving a typology of behavior, it is important to realize that vacation histories have both a temporal and a spatial dimension (Opperman 1992). The available number of vacation days may be allocated in many different ways over time and space. Moreover, these dimensions are interrelated in the sense that travel to more distant destinations implies a shorter actual vacation, travel not included. To fully capture these dimensions, several pieces of information are required. In terms of the temporal dimension, for example, individuals may decide to spend all of their vacation days at once, or at the other extreme, only spend one day at a time. Hence, the first aspect of the temporal dimension is frequency, defined here as the number of vacations within a given time horizon, such as a year. Frequency is strongly related to a second aspect of the temporal dimension: duration, which is defined as the number of consecutive days of a particular vacation period. If the total number of days is divided by frequency, the average duration of periods can be derived. These two aspects, however, still leave open the question when the vacations take place. Individuals and households may decide, unless they are confined to official periods, to spend their vacations in, for instance, winter, or summer, or in January, April, or August. Therefore, a third aspect of the temporal dimension is timing, defined as the part of year of departure for a particular vacation. Further, there is the temporal sequence of a vacation history, defined as the sequence of timing decisions over successive years. Similar aspects can be identified for the spatial dimension. First, vacation decisions involve the choice of destination. The level of detail in classifying or representing the choice aspect varies, depending on one’s objectives. The simplest way of representing destination choice is a dichotomous one, involving the distinction between a domestic and an international vacation. More detailed classifications, such as by a country or city region, are possible to describe this spatial aspect of any history. Another interesting spatial aspect of a pattern is whether an individual decides to go to the same destination on successive vacations or decides to go to a different destination. The former might be seen as a manifestation of destination-loyalty behavior, whereas the latter might be viewed as reflecting a desire for variety-seeking behavior, at least in terms of the destination involved. Thus, a second aspect of the spatial dimension underlying histories is spatial repetition, defined as the degree to which individuals and households choose the same destination at consecutive vacations. This is similar to the trip index typically used for a single vacation (Pearce 1995; Pearce and Elliott 1983; Uysal and McDonald 1989). A third factor can be the spatial sequence of any particular vacation history, defined as the sequence of destination decisions over successive years. Thus, conceptually it is argued that vacation histories can be described in terms of a temporal and a spatial dimension, which in turn are assumed to reflect several aspects. Each of these aspects contains a particular source of information that may be used to charac-
324
VACATION BEHAVIOR
terize any particular vacation pattern. The focus of this article is to derive a typology of vacation behavior, based on these various aspects. Unidimensional Sequence Alignment A vacation history can be represented as a string of information. For example, one may choose characters to represent whether a vacation is domestic or international. The sequence of vacations across some time horizon can then be represented as a string of characters, each representing a particular vacation. Alternatively, one may identify some time frame (such as a day) and use characters to denote whether that day was a vacation day, and if so, domestic or foreign. The sequence alignment method, originally introduced for comparing different DNA and RNA structures, was developed to measure the biological distances among such series of alphabetic characters. According to Kruskal (1983), the fundamental features of the sequence alignment method can be summarized as follows. Consider two sequences s and g with m+1 and n+1 elements, respectively. These s and g sequences are regarded as source sequence and target sequence, respectively. Each element of these sequences or strings contains a particular alphabetic character. The question is how similar these two strings are. The sequence alignment method defines similarity as the total amount of effort required to change s into g. Various operations are allowed. In particular, sequences can be made equal by using identity, substitution, insertion, and deletion operations. Their meaning can be best appreciated by examining the following two-dimensional comparison table or computational array.
Each cell (i,j) stores the amount of effort to equalize si with gj. The grayed cell is called the corner and is set to 0, indicating the comparison of two empty sequences. The cells in margin g are filled when s0 is equalized to gj’s. Likewise, the cells in margin s are filled when si’s are equalized to g0. The equalization process starts at cell (0,0) and ends at cell (m,n). The operations are represented by step-by-step moves (arrows in the table) from one cell to another, first to equalize each
BARGEMAN, JOH AND TIMMERMANS
325
initial segment, si with gj, and finally to change sequence sm into gn. A set of moves from the first to the last cell composes a trajectory, a path of equalizations. For each (i,j), these moves can be made from one of three cells, (i⫺1,j), (i⫺1,j⫺1), (i,j⫺1), called predecessors. Moves into the opposite direction are not allowed because they involve additional efforts, which are not necessary. Each of these moves represents an operation. The diagonal move represents identity if si and gj are the same at cell (i,j), and represents a substitution if si and gj are not the same. In both cases, the predecessor of cell (i,j) is cell (i⫺1,j⫺1). A horizontal move where the predecessor is (i,j⫺1) represents an insertion in that the move adds gj to si. Further, a vertical move, where the predecessor is (i⫺1,j), represents a deletion in that the moves eliminates si from si. It is important to note that these operations are simply alternative basic mechanisms to make one string equal to another one. They do not have any immediate behavioral interpretation. To illustrate these definitions and operations, it is assumed that sequence ACB has to be equalized to sequence ABC. One possible way to do this is to first insert an A in front of the existing A, then substitute the existing A by a B, identify C by C, and finally delete B at the end. These equalization efforts involve a horizontal, two diagonal and a vertical move in the comparison table as shown below. For example,
The order of the characters above and left to the table indicate the specific positions of the elements in the source and target sequence, respectively, and designate initial segments of g and s, respectively. Hence, the source sequence s has initial segments, s0 (null), s1 (A), s2 (AC) and s3 (ACB), and the target sequence g has g0 (null), g1 (A), g2
326
VACATION BEHAVIOR
(AB) and g3 (ABC). The initial segments of s are transformed into the corresponding initial segments of g. The horizontal move equalizes the null initial segment s0 with g1 by inserting an element A. This equalization changes the initial segment s0 (null) to s0 (A). The first diagonal move equalizes the transformed initial segment s1 (AA) with g2 (AB) by substituting the second A by an element B. The next diagonal move equalizes the transformed initial segment s2 (ABC) with g3 (ABC) by substituting C by the same element C. This equalization, however, changes nothing in s2, and thus this move is identity rather than substitution. The vertical arrow equalizes the transformed initial segment s3 (ABCB) with g3 (ABC) by deleting B. As a result, the entire sequence s is now equalized with g. Each operation involves a certain amount of effort. The magnitude of this effort, which is arbitrarily decided by the researcher, is denoted by weighting value or weight we(si,gj), ws(si,gj), wd(si,) and wi(,gj) for respectively equality (si=gj), substitution (si⫽gj), deletion and insertion operations. The symbol implies that no operation is applied to the element denoted by . In particular, we(si,gj)=0 if si=gj, ws(si,g0)=wd(si,) if iⱖ1, and ws(s0,gj)=wi(,gj) if jⱖ1. When a missing character si is to be substituted with a missing character gj, the substitution weighting value or identity weighting value is applied, depending on the application context. The substitution operation may be thought of as the sum of deletion and insertion operations. That is, ws(si,gj)=␦[wd(si,)+wi(,gj)]. If the substitution weighting value is regarded as the simple summation of the weighting values of the two operations, then ␦=1; otherwise, ␦⫽1. Normally, ws(si,gj)> wd(si,), and ws(si,gj)>wi(,gj). The computation of similarity proceeds by using these weights. Similarity is then defined as the sum of operation weighting values assigned to change sequence s into g. Now, it will be evident that the above example is only one of many possible ways to move around the array from (0,0) to (m,n). Each cell that is not in the margin has three predecessors, and hence different trajectories are possible. Consider, for example, the sequences s and g, each with two elements AC and AB, respectively. Even this simple case already has as many as 13 trajectories. Each trajectory (denoted by T) implies the following alignments, where the initial sequence of s is AC. Because each comparison case has that many different possible trajectories, an additional operational decision is required to calculate the similarity measure. The sequence alignment method is based on the calculation of the Levenshtein distance, defined as the smallest number of substitutions, insertions, and deletions required to change s into g. The equation for the weighted Levenshtein distance is:
BARGEMAN, JOH AND TIMMERMANS
1 ·d(sm,gn) m+n
d(s,g) =
(2)
d(s0,gj) = d(s0,gj⫺1) + wi(,gj)
(3)
d(s ,g ) = d(s
(4)
0
i⫺1
,g ) + wd(si,)
i⫺1
d(s ,g ) = min[d(s ,g d(si⫺1,gj) + wd(si,)] j
(1)
d(s0,g0) = 0 i
i
327
0
j⫺1
i
) + w(si,gj),d(s ,g
) + wi(,gj),
j⫺1
(5)
with
w(si,gj) =
冦
we(si,gj) = 0 if si = gj ws(si,gj)>0
if si⫽gj
(6)
328
VACATION BEHAVIOR
where: i, jⱖ1; d(s,g) is the total cost of equalization of s (=sm) with g (=gn); m and n are the number of elements in sequences s and g, respectively; d(si,gj) is the cost of equalization of si with gj, cumulated from the equalization of s0 to g0. For example, the following weights can be assigned: insertion wi(,gj)=deletion wd(si,)=1; ␦=1, and hence ws(si,gj)=2. The 13 trajectories then give the following computational results: T1: 1+1+1+1=4; T2: 1+2+1=4; T3: 1+1+1+1=4; T4: 1+1+2=4; T5:1+1+1+1=4; T6: 0+1+1=2; T7: 0+2=2; T8: 0+1+1=2; T9: 1+1+1+1=4; T10: 1+2+1=4; T11: 1+1+1+1=4; T12: 1+1+2=4; T13: 1+1+1+1=4. The Levenshtein function will identify T6, T7, and T8 as the optimal trajectories. d(s0,g0) = 0 d(s0,g1) = d(s0,g0) + wi(,g1) = 1 d(s0,g2) = d(s0,g1) + wi(,g2) = 2 d(s1,g0) = d(s0,g0) + wd(s1,) = 1 d(s1,g1) = min[d(s0,g0) + w(s1,g1),d(s1,g0) + wi(,g1), d(s0,g1) + wd(s1,)] = min[0,2,2] = 0 d(s1,g2) = min[d(s0,g1) + w(s1,g2),d(s1,g1) + wi(,g2), d(s0,g2) + wd(s1,)] = min[3,1,3] = 1 d(s2,g0) = d(s1,g0) + wd(s2,) = 2 d(s2,g1) = min[d(s1,g0) + w(s2,g1),d(s2,g0) + wi(,g1), d(s1,g1) + wd(s2,)] = min[3,3,1] = 1 d(s2,g2) = min[d(s1,g1) + w(s2,g2),d(s2,g1) + wi(,g2), d(s1,g2) + wd(s2,)] = min[2,2,2] = 2 d(s,g) (=d(s2,g2)) is the same for T6, T7 and T8 because, for example, in T6, d(s2g2) has as its predecessor d(s1,g2) rather than d(s1,g1) and d(s2,g1). Similarly, d(s1,g2) has d(s1,g1) rather than d(s0,g1) or d(s0,g2). Finally, d(s1,g1) has d(s0,g0) rather than d(s1,g0) or d(s0,g1). The alignment steps based on the Levenshtein function exactly reproduce the optimal trajectories T6, T7, and T8. Similarity is defined as the smallest sum of operation weighting values required to change s into g. The Levenshtein function searches all three possible moves for each cell, and the moves chosen at each comparison step may be an element of the optimal trajectory. The optimal trajectory or trajectories producing the similarity score can be found by tracing the cells from the last to the first when calculating reverse similarity (backtracking). The data used to derive the required typology consist of the Dutch “Continuous Vacation Panel”, a panel of respondents who regularly
BARGEMAN, JOH AND TIMMERMANS
329
report their vacation behavior. The panel started in 1980 and is still in the field. It is annually refreshed; here, only those respondents who were in the panel from 1991 until 1994 continuously are used. The sample size is 1,163 individuals, representing a random selection of the Dutch population. However, all respondents went on holiday at least once during this four-year time period. The data set contains many different variables regarding personal characteristics of the panelists and their vacation behavior, but for the analyses reported in this article, only the following variables were used: timing of the vacation (month of departure), the duration of the vacation (number of days), and the destination of the vacation (classified into domestic and abroad). Note that the temporal and spatial aspects discussed earlier can be derived from these variables. Data Analysis The description of the sequence alignment method indicates that to prepare the data for analysis, the information embedded in the vacation histories of the panelists should be translated into character strings. Therefore, every domestic vacation day was coded by a D, every vacation day abroad was coded by an A. Strings were constructed such that they represented the days of the year. If a panelist was not on vacation on any particular day, an H was recorded. Thus, the vacation history of each respondent was coded by a string of D, A and H-s. Each string consists of 1,461 characters, reflecting the total number of days between 1991 and 1994. The duration of a vacation period is reflected by the number of successive D-s or A-s in the sequence. Thus, five successive A’s depict a duration of five vacation days abroad. The aspect of timing is depicted by the position of a H?H segment in the string. Because the actual timing of a vacation in the data set was recorded only at the level of the month, the vacations at the beginning of the relevant segment of the string were more or less arbitrarily recorded. Consequently, the typology will not be sensitive to any timing differences within the month. Given the length of the strings, this operational decision is not expected to have major influence on the outcome of the analyses. The frequency aspect of any vacation history is captured by the number of HDH or HAH segments in the string. The destination aspect of the vacation history is captured by the nature of the characters used, D representing a domestic vacation and A representing a vacation abroad. Repetition is reflected in the AA–AD–DA–DD combinations of consecutive segments, representing a particular vacation period. As such, the complete sequence of characters of the string provides the basis for the temporal and spatial sequence. Having represented the vacation histories in terms of a string of information, the analysis calculated the amount of effort required to pairwise equalize the strings representing the vacation histories of different respondents, and classified the respondents, based on this measure. The readily available computer program CLUSTAL W was used for this analysis. However, this was not a straightforward task in that
330
VACATION BEHAVIOR
this program was originally developed to align DNA strings, which generally are considerably less complicated than vacation histories. The program did not allow to group all panelists simultaneously. To some extent, this was a problem of limited memory, but more fundamentally also a problem of computing time as the procedure for this size of string and this many respondents would have taken weeks to complete. The calculation of the pairwise similarity measure for 1,163 respondents involves 675,703 comparisons. In providing a possible solution to this problem, the following strategy for deriving the typology was developed: a subsample of panelists was identified, such that they span the observed vacation history patterns; the sequence alignment method was then used to group this subsample of panelists into homogeneous segments; and next, the resulting segments were used as seed points, with the remaining panelists assigned to these segments. The subsample was selected by constructing multidimensional crosstabulations for the duration and destination variables used in the analysis. For each cell, respondents were randomly selected in proportion to the observed frequency. The total number of panelists in this subsample was 102, a reasonable size for the available budget of computing time. The sequences of these respondents were used as input for the CLUSTAL W program, which calculated the pairwise sequence similarities (assuming all weights to be equal to 1), constructed a dendrogram and performed a multiple alignment (Thompson, Higgins and Gibson 1994). The program is based on the neighbor-joining method (Saitou and Nei 1987). This process took about 20 hours computing time on a mainframe computer. Next, the profiles of the remaining 1,061 panelists were aligned. In three different computer sessions, respectively 353, 353 and 355 different sequences were added to the first 102 sequences. Each session took approximately 105 minutes. The computer program PHYLIP was used to draw a dendrogram, and the final typology of panelists was based on a visual inspection of this graph. A total of eight groups or types was identified. The distribution of number of respondents, and respectively number of vacations across these groups is as follows: group I (20.9; 16.4%), group II (11.1; 16.3%), group III (4.5; 4.5%), group IV (5.9; 7.7%), group V (9.5; 12.8%), group VI (8.9; 9.8%), group VII (21.6; 21.3%), and group VIII (17.7; 11.1%). Study Findings This section describes the eight groups in terms of the various temporal and spatial aspects underlying the typology. To allow a comparison of the groups, a series of figures and tables was prepared. The actual description, however, will not be on an aspect-by-aspect basis but by group. Figures 1a–h portray the average number of domestic and foreign vacation days per month between 1991 and 1994. Note that the vertical scale of these figures differs. It allows an interpretation of the destination (domestic vs. international), the temporal and spa-
BARGEMAN, JOH AND TIMMERMANS
331
Figure 1. Average Number of Monthly Vacation Days by Group
tial sequence and timing. The aspects of frequency, duration, and spatial repetition are described in the text. Frequency denotes the average number of vacations per year and duration is equal to the average number of days per trip. Spatial repetition was measured as the average number of times that the same destination was visited on two successive vacations across a history. Based on these results, the various groups can be described as follows. Group I consists of respondents who are primarily oriented at the Netherlands as their holiday destination. In almost every month, the average number of domestic vacation days exceeds the average number of days spent abroad. July is the favorite month of this group followed by August and June or May. For three years, July is also the month with the highest number of days spent abroad. The average frequency of this group is 1.3, which is relatively low. The average duration of their vacations is also relatively low, with 8.8 days. The degree of spatial repetition, on the other hand, is the highest among all eight identified groups with a value of .71. The vacation behavior of Group II is quite different. Its pattern is very stable. On average, these respondents spend eight days of domestic vacation, and 20 days abroad, suggesting a foreign orientation. The average number of days per trip is 11.6, which makes this group the one with most days. This group also scores highest in terms of frequency: the average number of vacation periods is 2.5. The degree of spatial repetition is about average with a score of .59. In terms of timing, their favourite month for going abroad is July, followed by June, August, March, and, in some years, February. The latter month
332
VACATION BEHAVIOR
may be suggestive of winter sports, May and October and sometimes April are the only months, for which the average number of domestic vacation days is higher than the number of vacation days abroad. Figure 1 clearly demonstrates that this group exhibits a clear temporal sequence in that the vacation peaks occur at very regular intervals. Like Group I, Group III consists of respondents who are primarily domestically oriented. This is indicated by a high average number of domestic vacation days, which is even higher than the one observed for Group I, and the number of vacation days spent abroad is also higher. On average, Group III spends 11 days in the Netherlands, and 5 days abroad. The average number of days during a trip is 9.6, with these respondents taking 1.7 vacations per year. The degree of spatial repetition of this group is much lower than that of Group I: .64. Compared to the previous two groups, the vacations of Group III are more spread over time. Nevertheless, July, August, and June are the most favorite months for a domestic holiday, while July and May are preferred for a vacation abroad. The potential advantages of using a sequence alignment method are best illustrated by Groups IV and V. Both groups are primarily focused on foreign vacations, with a typical number of about 18 days per year. The average number of days per trip for both groups is 10, and the average per year is 2.2 and 2.3 for Group IV and V, respectively. Further, the degree of spatial repetition of both groups is very similar: .61 against .63. Hence, based on these aspects the two groups are very similar. The main difference between them concerns the actual sequence of the vacations, and this would have been more difficult to detect by conventional classification techniques. As demonstrated by Figure 1, vacations of Group V are spread more over time than those of Group IV. Group IV still has a peak in July, especially in 1993; Group V, on the other hand, does not have a clear peak. The number of days spent abroad is relatively high throughout the period April–October. Moreover, July, except for 1991, is not the most popular month for vacationing. The average number of days spent abroad is higher than of domestic in almost all months for both groups. Figure 1 suggests that the general pattern of vacation days of Group VI is very similar to that of Group II. In terms of timing, the favorite month of this group is July, followed by June (1991 and 1992) or August (1993 and 1994). As for Group II, February is a popular month to go abroad. Figure 1 also demonstrates a clear temporal repetition in that the vacation peaks occur at very regular intervals. Moreover, both groups have a more or less similar average number of vacation days. The difference between the two groups is primarily not one of pattern (sequence), but of spatial repetition and especially frequency. The degree of spatial repetition of Group VI is .62 (against .59 for Group II); the frequency is 1.9 (against 2.5 for Group II). The average number of vacations of this group is 1.9. Group VII is also relatively similar to Groups II and VI, except that the frequency and duration are significantly lower. The average number of annual trips for this group is only 1.7; the average duration is only 9.8 days. This group also has a lower degree of spatial repetition,
BARGEMAN, JOH AND TIMMERMANS
333
the relevant index being only .58. However, similar to Groups II and VI, Group VII is primarily oriented abroad, with July as the most favorite month in three years. The average annual number of domestic vacation days ranges between 3 and 4, whereas the average annual number of vacation days spent abroad is between 11 and 14. Furthermore, Group VIII could be qualified as the relative inactive. The average number of vacation days per trip is the lowest of all groups (7.3 days). The average frequency is also the lowest of all groups (1.1 trips per year). Perhaps to compensate for the low frequency, this group exhibits the highest degree of variety-seeking behavior as indicated by a spatial repetition index of .56 only. June and July are the most favorite months to start a vacation abroad, whereas August is the most favorite month for a domestic vacation. Figure 1 illustrates that this group shows the less stable pattern. CONCLUSION When deriving a typology of vacation behavior, the actual sequence of decisions is an important variable, in addition to the more conventional aspects such as frequency, destination, and duration. Sequence patterns provide managers information about loyalty, repeat behavior, variety-seeking, recurrent behavior, and the like. Conventional crosssectional classification techniques, however, do not allow one to incorporate the sequential information embedded in vacation history data. The purpose of this study was to suggest the use of sequence alignment methods as a potential means of deriving a typology, which incorporates such sequential information. The quintessence of a unidimensional sequence alignment method has been discussed and its use has been illustrated, using a Dutch vacation panel. The results of the application of the sequence alignment method to this panel data illustrate its potential value. The derived typology consisted of eight groups, which varied in terms of series of underlying temporal and spatial aspects, including frequency, duration, timing, destination, temporal and spatial sequence, and spatial repetition. Interestingly, the main differences between some groups were primarily in terms of the actual sequence patterns of their vacations, and this aspect was well depicted by the sequence alignment method. The results of the application of the sequence alignment method suggests that it is a flexible approach for analyzing sequential information, provided that particular additional operational decisions are made. In particular, because sample size tends to be relatively large in tourism research, a stagewise approach was suggested to actually allow one to deal with such large sample sizes. Obviously, the suggested stagewise approach implies that the final classification will be dependent upon the initial selection and grouping of respondents. Different initial selections will lead to different final classifications; the typology is not optimal in this respect. Although this might be an issue of concern, it should be realized that most clustering algorithms also lack optimality conditions. The final results of clustering algorithms are also dependent upon the initial seed points, and many algorithms only re-
334
VACATION BEHAVIOR
allocate subjects once, thereby not even trying to reach an optimal solution. Hence, the non-optimality property of the sequence alignment method is not uncommon in classification research. The results of simulation efforts indicate that the influence of the initial selection on the final outcome will be limited, provided that the initial subsample of respondents is carefully selected and spans the range of vacation history patterns. Hence, the main conclusion is that sequence alignment methods offer a potentially valuable approach to classification analysis if one is interested in the sequential aspects of the data at hand. This is not to say that the method cannot be improved for applications in tourism research. As so often when methods originally have been developed for applications in other fields, a few issues deserve further research. First, the position of a character in a string of information does not influence the measure of similarity as the Levenshtein distance is only based on identity, insertion, deletion, and substitution operators. While this may be a reasonable and valid approach in other disciplines, it warrants further examination and experimentation in tourism research. In particular, such endeavors should try to develop a similarity measure which is more sensitive to the number of positions in a string of sequential information that are implied by substitution operations. If such a measure could be successfully developed, the final typology might be more sensitive to differences between tourists in terms of the sequence of their vacation histories. A second issue that deserves more attention is the generalization of unidimensional sequence alignment methods to multidimensional sequence alignment methods. The present classification is based on several temporal and spatial aspects only. The actual coding into H, D and A implies that only a single dimension was used for the actual classification, as the temporal dimension was made implicit by deciding to take a day as the basic element of the strings. However, in other applications one might decide to explicitly use time as a second dimension, or one might be interested in additional dimensions of vacation behavior such as the nature of the activity (city vs. beach vs. nature, etc), company, means of transportation, type of accommodation, and the like. One straightforward way of dealing with this multidimensional problem would be to choose a character for each possible combination of categories along each dimension. But this has the obvious disadvantage that one can no longer trace the interrelationships and interdependencies existing between the dimensions. Hence, there is clearly a need for a true multidimensional sequence alignment method. Based on the results reported in this paper and the prospects of such future research, (generalized) sequence alignment methods may represent an effective and powerful alternative to traditional classification techniques in tourism analysis for researchers interested in deriving typologies of tourism choice behavior. Future research should also compare the sequence alignment method with alternative ways of incorporating time-sensitive aspects of tourist behavior into conventional classification methods. For example, provided that panel data are available, one could derive explicit meas-
BARGEMAN, JOH AND TIMMERMANS
335
ures of loyalty behavior (Popkowski and Timmermans 1997). If similar measures could be developed for variety-seeking and recurring sub patterns, these measures could be viewed as measures of the vacation history pattern, in addition to timing, frequency, and more. These measures could then be used as input to a cluster analysis to derive segments. The sequence alignment method would not require such pre-processing. The kind of classification which is explicitly based on the structure of vacation histories provides potentially useful information for managers. For example, the description of the groups in terms of frequency allows them to differentiate between more and less active tourists. Likewise, the timing variable provides information about when they go on vacation. Similarly, spatial repetition tells something about their loyalty towards the destination. Marketing decisions can be tailored towards this information. It is also possible to link sociodemographic information to the grouping, giving additional data about the probability that a tourist with a certain profile belongs to a particular group.왎 Acknowledgements—The research project was funded by the Cooperation Center Tilburg and Eindhoven. The authors also wish to thank the Netherlands Research Institute for Recreation and Tourism and the Netherlands Board of Tourism for making available the Continu Vakantie Onderzoek panel for analysis. The development of the conceptual framework underlying the present study has benefited from discussions with Theo Beckers, Hugo van der Poel, and other members of the Department of Leisure Studies at the Tilburg University. Thanks are extended to Peter van der Waerden of the Urban Planning Group at the Eindhoven University of Technology who assisted in the conversion of the data files. Any errors remain the responsibility of the authors.
REFERENCES Andereck, K. L., and L. L. Caldwell 1994 Variable Selection in Tourism Market Segmentation Models. Journal of Travel Research 33(2):40–46. Cha, S., K. W. McCleary, and M. Uysal 1995 Travel Motivations of Japanese Overseas Travelers: A Factor-Cluster Segmentation Approach. Journal of Travel Research 34(1):33–39. Cohen, E. 1972 Towards a Sociology of International Tourism. Social Research 39(1):164–182. 1979 A Phenomenology of Tourist Experiences. Sociology 13:179–201. Crask, M. R. 1981 Segmenting the Vacationer Market: Identifying the Vacation Preferences, Demographics and Magazine Readership of Each Group. Journal of Travel Research 20(1):29–34. Etzel, M. J., and A. Woodside 1982 Segmentation Vacation Markets: The Case of the Distant and Near-Home Travelers. Journal of Travel Research 20(4):10–14. Fakeye, P. C., and J. L. Crompton 1991 Image Differences Between Prospective, First-time, and Repeat Visitors to the Lower Rio Grade Valley. Journal of Travel Research 30(2):10–15. Fodness, D. D., and L. M. Milner 1992 A Perceptual Mapping Approach to Theme Park Visitor Segmentation. Tourism Management 13:95–101. Gitelson, R. J., and J. L. Crompton 1984 Insights into the Repeat Vacation Phenomenon. Annals of Tourism Research 11:199–217.
336
VACATION BEHAVIOR
Ghyte, D. M., and A. Phelps 1989 Patterns of Destination Repeat Business: British Tourists in Mallorca, Spain. Journal of Travel Research 28(1):24–28. Hsieh, S., J. T. O’Leary, and A. M. Morrison 1992 Segmenting the International Travel Market by Activity. Tourism Management 13:209–223. Hsu, C. H. C., and S. Sung 1997 Travel Behaviors of International Students at a Midwestern University. Journal of Travel Research 36(1):59–65. Javalgi, R. G., E. G. Thomas, and S. R. Rao 1992 Consumer Behavior in the U.S. Pleasure Travel Marketplace: An Analysis of Senior and Nonsenior Travelers. Journal of Travel Research 31(2):14–19. Kruskal, J. B. 1983 An overview of sequence comparison. In Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal, eds., pp. 1–44. London: Addison-Wesley. Lang, C-T., and J. T. O’Leary 1997 Motivation, Participation, and Preference: A Multi-Segmentation Approach of the Australian Nature Travel Market. Journal of Travel and Tourism Marketing 6(3/4):159–180. Lang, C-T., J. T. O’Leary, and A. M. Morrison 1997 Distinguishing the Destination Choices of Pleasure Travelers from Taiwan. Journal of Travel and Tourism Marketing 6(1):21–40. Loker, L. E., and R. R. Perdue 1992 A Benefit-Based Segmentation of a Nonresident Summer Travel Market. Journal of Travel Research 31(1):30–35. Madrigal, R., and L. R. Kahle 1994 Predicting Vacation Activity Preferences on the Basis of Value-System Segmentation. Journal of Travel Research 32(3):22–28. Mazanec, J. A. 1984 How to Detect Travel Market Segments: A Clustering Approach. Journal of Travel Research 23(1):17–23. Mo, C., D. R. Howard, and M. E. Havitz 1993 Testing an International Tourist Role Typology. Annals of Tourism Research 20:319–335. Mudambi, R., and T. Baum 1997 Strategic Segmentation: An Empirical Analysis of Tourist Expenditure in Turkey. Journal of Travel Research 36(1):29–34. Opperman, M. 1992 Intranational Tourist Flows in Malaysia. Annals of Tourism Research 19:482–500. 1996 Visitation of Tourism Attraction and Tourist Expenditure Patterns: Repeat versus First-time Visitors. Asian Pacific Journal of Tourism Research 1(1):61–68. 1997 First-time and Repeat Visitors to New Zealand. Tourism Management 18:177–181. Pearce, D. G. 1995 Tourism Today: A Geographical Analysis (2nd ed.). Essex: Longman. Pearce, D. G., and J. M. C. Elliott 1983 The Trip Index. Journal of Travel Research 22(1):6–9. Plog, S. C. 1990 A Carpenter’s Tools: An Answer to Stephen L. J. Smith’s Review of Psychocentrism/Allocentrism. Journal of Travel Research 28(2):43–45. 1991 A Carpenter’s Tools Re-Visisted: Measuring Allocentrism and Psychocentrism Properly … The First Time. Journal of Travel Research 29(4):51. Polovitz Nickerson, N., and G. D. Ellis 1991 Traveler Types and Activation Theory: A Comparison of Two Models. Journal of Travel Research 29(3):26–31. Saitou, N., and M. Nei 1987 The Neighbor Joining Method. Molecular Biology and Evolution 4(4):406–425.
BARGEMAN, JOH AND TIMMERMANS
337
Snepenger, D. J. 1987 Segmentating the Vacation Market by Novelty-Seeking Role. Journal of Travel Research 25(3):8–13. Shoemaker, S. 1994 Segmentation the U.S. Travel Market According to Benefits Realized. Journal of Travel Research 32(3):8–21. Smith, S. L. J. 1990a A Test of Plog’s Allocentric/Psychocentric Model: Evidence From Seven Nations. Journal of Travel Research 28(4):40–42. 1990b Another Look at the Carpenter’s Tools: A Reply to Plog. Journal of Travel Research 29(2):50–51. Spotts, D. M., and E. M. Mahoney 1991 Segmenting Visitors to a Destination Region Based on the Volume of their Expenditures. Journal of Travel Research 24(4):24–31. Thompson, J. D., D. G. Higgins, and T. J. Gibson 1994 CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22):4673–4680. Thrane, C. 1997 Values as Segmentation Criteria in Tourism Research: The Norwegian Monitor Approach. Tourism Management 18:111–113. Uysal, M., and C. D. McDonald 1989 Visitor Segmentation by Trip Index. Journal of Travel Research 27(3):38–41. Willenborg, J. F., and A. G. Woodside 1976 Segmentation of Vacation Attraction Market: Some Recursive Models. Proceedings of the 7th Annual Conference of the Travel Research Association, pp. 247–252. Salt Lake City UT: Bureau of Economic and Business Research, University of Utah. Woodside, A. G., V. J. Cook, and W. A. Mindak 1987 Profiling the Heavy Traveler Segment. Journal of Travel Research 25(4):9–14.
Submitted 8 January 1999. Resubmitted 17 January 2000. Accepted 1 September 2000. Final version 12 January 2001. Refereed anonymously. Coordinating Editor: Robert W. McLellan