Network approach to tourist segmentation via user generated content

Network approach to tourist segmentation via user generated content

Annals of Tourism Research 73 (2018) 35–47 Contents lists available at ScienceDirect Annals of Tourism Research journal homepage: www.elsevier.com/l...

958KB Sizes 0 Downloads 58 Views

Annals of Tourism Research 73 (2018) 35–47

Contents lists available at ScienceDirect

Annals of Tourism Research journal homepage: www.elsevier.com/locate/annals

Network approach to tourist segmentation via user generated content Juan M. Hernándeza, Andrei P. Kirilenkob, Svetlana Stepchenkovab,

T



a

Institute of Tourism and Sustainable Economic Development (TIDES), University of Las Palmas de Gran Canaria, c/Saulo Torón s/n, 35017 Las Palmas, Spain Department of Tourism, Recreation & Sport Management, College of Health and Human Performance, University of Florida, USA

b

A R T IC LE I N F O

ABS TRA CT

Associate editor: Josef A. Mazanec

The study contributes to the tourism literature by demonstrating an approach to segmenting tourists using network analysis with user-generated content. Online reviews of destination attractions are considered as a proxy for visitation data reflective of tourists’ interests. The connectivity between attractions is represented with a network of links created by tourists visiting and reviewing multiple attractions. Attraction clusters are revealed by segmenting this network using network analysis tools. Two segmentation solutions are provided: a posteriori, in which only review information is taken into account, and mixed, in which tourist groups are defined a priori by their travel interests and age, and this information is combined with visitation information. The findings are validated using geovisualization and by comparing them with randomly simulated models.

Keywords: Attractions Network analysis Social networks Tourist segmentation User-generated content (UGC)

Introduction Destinations around the world strive to increase their value by delivering intelligent, customized services to tourists. Destinations’ ability to provide such services is inherently connected to their capacity to collect, integrate, and analyze data from various sources and then redistribute that information to a stakeholder network of businesses, government agencies, policy-makers, as well as various organizations and activity groups (Gretzel, Werthner, Koo, & Lamsfus, 2015). Two elements, timely information and a strong network of interrelated entities, are essential in this process. The construction of tourism networks benefits the areas of learning and exchange (e.g., knowledge transfer, communications), business activities (e.g., cooperative marketing, purchasing, and production as well as enhanced cross-referrals), and community development (e.g., fostering common purpose, support for destination development, or increased sense of community) (Morrison, Lynch, & Johns, 2004). Sources of the information that is distributed via such networks have been extended from surveys and collected statistics to digital traces that tourists leave at a destination and user-generated content (UGC) on various online forums and social networks. Despite concerns related to the validity of online data (Trend, 2013), it has been successfully demonstrated that UGC provides valuable information to tourism and hospitality services (Xiang and Gretzel, 2010), e.g., hotels (Vermeulen and Seegers, 2009) and restaurants (Zhang, Ye, Law, & Li, 2010). Destination networks have been of interest to researchers for the last three decades (e.g., Jamal, Smith, & Watson, 2008; Tremblay, 1998), and the literature on the topic is still growing. Earlier studies focus on the evolution of business networks and interorganizational relationships (Morrison et al., 2004; Pavlovich, 2003; Tinsley and Lynch, 2001), tourism policies and governance



Corresponding author at: 312 Florida Gym, P.O. Box 118208, Gainesville, FL 32611-8208, USA. E-mail addresses: [email protected] (J.M. Hernández), andrei.kirilenko@ufl.edu (A.P. Kirilenko), svetlana.step@ufl.edu (S. Stepchenkova). https://doi.org/10.1016/j.annals.2018.09.002 Received 5 June 2018; Received in revised form 27 August 2018; Accepted 4 September 2018 0160-7383/ © 2018 Elsevier Ltd. All rights reserved.

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

agents, power structures between various destination actors and similar issues (see the review in Tran, Jeeva, & Pourabedin, 2016). A recent review of the tourism network literature (van der Zee & Vanneste, 2015) adds two more interest foci to network literature: coopeting (cooperative competition) networks with an emphasis on relationships between different stakeholders and studies interested in networks as a structure with a specific configuration. Whereas research on factors that govern network formation and functioning primarily employ a descriptive apparatus, the “network configuration” studies utilize graph theory to make conjectures based on the identified network structure. An example would be a study by Shih (2006), who focuses on how attractions and destination facilities are connected via tourist driving routes, or Liu, Huang, and Fu (2017), who describe an attraction network informed by tourist flows. Despite the fact that tourists are the main users and primary evaluators of services at a destination, networks in tourism research are rarely investigated from the tourist’s point of view (van der Zee & Vanneste, 2015). At the same time, tourists’ movements at a destination between various attractions, hotels, and restaurants comprise elaborate networks that can potentially be informative for understanding tourists’ interests and behavior and as such serve as a basis for tourist segmentation (Frochot & Morrison, 2000). Selecting homogeneous groups in an otherwise heterogeneous tourist market makes it possible to better tailor services, provide higher satisfaction, achieve repeat visitation, achieve more revenue for businesses, and, ultimately, create a more dynamic and vibrant destination. By being able to specialize in catering to the travel needs of a particular tourist segment, DMOs can gain an edge compared to other competing destinations (Dolnicar, 2008). “Market segmentation tends to produce depth of market position in the segments that are effectively defined and penetrated. The [organization that] employs market segmentation strives to secure one or more wedge-shaped pieces [of the market cake]” (Smith, 1956, p. 5). Thus, tourist segmentation is considered one of the most important tasks that destinations perform and one of the main subject areas of tourism research (Kirilenko & Stepchenkova, 2018; Yuan, Gretzel, & Tseng, 2015). The present study brings together three elements discussed above. One is the need to identify sufficiently large and viable tourist segments, which would allow more effective dispensation of tourist information and customize services. Another element is building a network of interrelated attractions that tourists visit while at destination that manifests their interests and behavior and can serve as a basis for segmentation. By doing this, the study is effectively placed into the “network configuration” group of studies because it utilizes graph theory with its quantitative approach, rather than qualitative methods of network sketching. Finally, in contrast to previous studies that primarily used survey data to gain insights about places of tourists’ interest at a destination, the basis for the network construction is UGC of travel reviews, which provide a source of big data information regarding tourists’ actual behavior at a destination as well as their interests and personal characteristics. Thus, the main aim of this study is to investigate whether tourist segmentation can be achieved through the network analysis of attractions that tourists visit at a destination using UGC as the information source. Network analysis A network is a convenient way of describing connected objects such as individuals, businesses, and attraction points. In particular, a network is a set of vertices (nodes), edges linking the nodes, and their descriptors. The network is called unipartite when all nodes belong to one category. For example, in social networks, these nodes usually represent individuals/agents or groups of individuals/ agents, whereas edges represent relationships among those agents (e.g., friendship among individuals). The density of edges in the network is the quotient between the observed number of edges and the number of possible edges. A bipartite, or affiliation, network includes two categories of nodes (e.g., people and events), and a relationship can be produced only between nodes from different categories (e.g., a person attending an event). Additionally, a network (both unipartite and bipartite) can be weighted; that is, every edge can have a numerical value, e.g., the number of times a person has attended an event. Social network analysis provides quantitative methods to analyze such networks and has been extensively used in social sciences since the first half of the last century. An extended introduction to social network analysis and its applications in social systems is provided by Wasserman and Faust (1994) and Scott (2012). One of the most common research questions in the study of networks is the detection of clusters, also called communities. A map of communities reveals how the network is configured by showing the existence of parts of the graph that work to some extent autonomously. Such a map highlights the similarities or differences among nodes in terms of connectivity and, therefore, is conducive to inferring the internal forces that create the observed network configuration. The problem of clustering has existed in the analysis of social networks for several decades (Wasserman & Faust, 1994). Methodologically, community detection involves finding the best partition of nodes in groups or clusters in such a way that the density of links among nodes inside every cluster is higher than the density of edges among nodes belonging to different groups. Lately, new Big Data applications have emerged revealing e.g., scientific collaboration networks and communities in online social networks websites (see Fortunato, 2010 for a review of some applications). In tourism, Baggio (2011) analyzed communities in the collaboration network of tourism stakeholders in the island of Elba, Italy. Asero, Gozzo, and Tomaselli (2016) unveiled clusters in the network formed by origins and destinations of tourist trips in Sicily. Williams, Terras, and Warwick (2013) detected communities in electronic word-of-mouth networks for a destination. Finally, DavidNegre, Hernández, and Moreno-Gil (2018) unfolded a core-periphery structure of a network of tourists and activities in a destination. Notwithstanding these studies, the application of community detection methods to tourism networks is still very limited. Tourist segmentation Effective market segmentation identifies tourist segments whose interests a destination can effectively serve that are sufficiently 36

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

distinct, large in size, and accessible for communication (Dolnicar, 2008; Frochot and Morrison, 2000). Two major approaches to segmentation have been recognized (Dolničar, 2004; Mazanec, 1992): a priori, or commonsense, segmentation, in which the factors identifying the groups (e.g., demographics, expenditures, or travel patterns) are selected before the study is carried out (e.g., Dolnicar, 2004), and a posteriori, post hoc, or data-driven, segmentation. In data-driven segmentation, evaluations of the benefits tourists seek (Frochot & Morrison, 2000), their motivations (Andreu, Kozak, Avci, & Cifter, 2006; Park & Yoon, 2009), preferred vacation activities (Choi & Tsang, 2000; Hsieh, O’Leary, & Morrison, 1992; Kidd, King, & Whitelaw, 2004; McKercher, Ho, Cros, & SoMing, 2002; Mumuni & Mansour, 2014), or fears (Dolnicar, 2005) are subject to multivariate statistical analysis (e.g., factor-cluster algorithms) to produce heterogeneous groups of homogeneous respondents that can be further described through demographic, psychographic, or travel pattern characteristics. The two approaches can also be combined: for example, a demographic segment of interest is selected first and then split into subsegments via a posteriori analysis (Kastenholz, Davis, & Paul, 1999; Moscardo, Pearce, Morrison, Green, & O’leary, 2000) or two data-driven segmentations are conducted in succession (Dolničar, 2004). Online UGC presents new opportunities for segmentation that have not yet been adequately examined (Huang & Bian, 2009; Marine-Roig & Clavé, 2015; Pantano, Priporas, & Stylos, 2017). UGC travel reviews report real behavior that occurs at a destination, such as visitation to hotels, restaurants, or attractions, as opposed to hypothesized preferences and likelihood scores for future behavior, as often collected in surveys. UGC is free from the complications and biases associated with surveying human subjects, is easy to collect, and, potentially, allows for longitudinal analysis of visitors’ behavior. Despite the abovementioned availability of selfreported travel experiences, segmentation studies by preferred activities are few and often rely on self-reports prior to the trip (e.g., Mumuni & Mansour, 2014) or about a “vacation in general” (e.g., Choi & Tsang, 2000), which constitutes an obvious limitation, as “travel preferences are often hidden and are not explicitly known when users start to plan their trips, particularly if visiting an unfamiliar place” (Hsu, Lin, & Ho, 2012, p. 3257). It also has been observed that despite multiple Internet instruments for trip planning, few systems provide recommendations for destination attractions customized for individual tourists (Hsu et al., 2012). The advantages of such customization, for both tourists and destinations, are a reduction of time costs in information search, better meeting of individual interests, and more effective packaging, which increases the efficiency of destination operation. This study proposes and demonstrates a novel way to segment tourists via attractions that tourists visit at a destination. The premise for the analysis is that visitation patterns reflect tourists’ interests and motivations and the benefits they seek from travel, and the clustering of attractions reveals those interests. The spatial distribution of attraction clusters is indicative of tourist mobility patterns and can be used for management and marketing insights, as attractions from a certain group can be promoted together and/ or made more visible via tailored programming and packaging. The segmentation approach is based on the reviews that tourists post on online recommender platforms such as TripAdvisor and information they provide in user profiles about their travel interests, age, or gender. Online reviews constitute records of tourist behavior at a destination and are considered as proxy data for actual visitation records. This study uses network analysis, a method with potential that has not been uncovered to the full extent in tourism studies. Segmentation using network analysis implies nonlinear models of relationships between tourists, their interests, and the attractions they visit (Baggio & Sainaghi, 2016), and constitutes a main contribution of this study to the tourism literature. Using readily available online data instead of surveys of human subjects is also considered a method-related contribution. Two segmentation analyses are conducted. In Section “A posteriori segmentation: Tourists-attractions network”, communities of destination attractions are identified using only the names of attractions reviewed, which constitutes an a posteriori approach to segmentation. Spatial analysis is also conducted to verify that the produced communities are sensible from the perspective of regional proximity (Liu et al., 2017). In Section “Mixed a priori-a posteriori segmentation”, two kinds of information are employed: names of attractions reviewed and the tourist profile, specifically, tourist interests and age. The result is two networks: interests-attractions and age-attractions; both represent a mixed a priori-a posteriori segmentation approach. All three segmentation analyses are validated by a number of techniques, such as testing the obtained solution against randomly simulated models with the same network parameters. A clear differentiation of the segmentation solutions and the quality of clustering from those produced randomly (and, as such, considered the null models) signals that attraction communities are indeed formed based on visitors interests, motivations for travel, and benefits sought at a destination rather than on chance. Situational context The study selected a popular tourist destination in Florida – the city of St. Augustine and its surrounding region, namely, St. Johns and Flagler Counties (Fig. 1A). The city of St. Augustine is located on the Atlantic coast of northeastern Florida and is “the site of the oldest continuously occupied European and African American settlements in the United States” (National Park Service, 2018), with rich Spanish and British heritage. The focal point of the destination is a former military fort Castillo de San Marcos, which draws over 800 thousand visitors annually. The city displays colonial style architecture, with many attractions and exhibitions portraying the daily life of the Spanish and English settlers between the 16th and 18th centuries (Fig. 1B). The city possesses a remarkable example of Spanish Renaissance architecture – originally the Hotel Ponce de Leon, one of the earliest fine lodging establishments in Florida, and now a dormitory of Flagler College. The history of the city is also closely connected with the human rights movement: Dr. Martin Luther King Jr. was jailed here in 1964 (Library of Congress, 2018). Currently, the St. Augustine area boasts a lovely tourist scene with great shopping, funky museums, period reconstructions, trolley rides, and much more. The encompassing St. Johns County is a popular tourist destination that hosts 6 million visitors annually (St Augustine, Ponte Vedra, & The Beaches Visitors and Convention Bureau, 2015). On TripAdvisor, the number of comments on St. Augustine’s attractions rank 3rd in Florida (after Orlando and Key West). The beautiful shoreline is great for strolling, shell hunting, surfing, and kayaking. Its 40-foot sand dunes are among the highest in Florida. A string of white sand beaches leads to the Ponte 37

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

Number of

Number of reviews reviews

Old Town Trolley Tours of St. Augustine

B

Fountain of Youth Archaeological Park

St. George Street Castillo de San Marcos Flagler College The Lightner Museum Distillery

500 < <500 501 750 500 - -750 751- -1000 1000 751 >1001 >1000

C

Lighthouse & Maritime Museum Alligator Farm Zoological Park St. Johns

A St. Augustine

Flagler

St. Augustine Beach

Fig. 1. A: Study area: St. Johns and Flagler Counties, Florida. B: Top ten attractions (each with 350+ reviews); all top ten are located in St. Augustine or its vicinity. C: Frequency distribution of the number of reviews over attractions and tourists.

Vedra Beach seaside community with a population of approximately 30 thousand people. This thriving upscale resort is world-famous for its golf courses. To the southeast of St. Augustine, a line of wide beaches (e.g., Butler, Crescent, and Flagler), state parks (e.g., Anastasia and Washington Oaks Gardens), and nature preserves (e.g., Graham Swamp Conservation Area) stretches along the Atlantic coast. The boundaries of two counties, Flagler and St. Johns, roughly correspond to a 90 km stretch of the coastal area centered at St. Augustine (Fig. 1A). Method Community detection The problem of clustering in networks has been addressed from different perspectives. A good survey of available methods is also provided by Bedi and Sharma (2016). In this paper, we are dealing with weighted unipartite and bipartite networks. Specifically, given the number of nodes N and M in two categories, a bipartite network can be algebraically represented using a weighted N × M matrix W, where elements wij indicate weights between nodes i and j. A zero-value weight means that there is no edge between the two points. In unweighted networks, wij can adopt only values 0 and 1. For example, in a tourist-attraction bipartite network, the value wij = 1 means that tourist i has visited attraction j. In a weighted network, the weight may designate how many times a tourist has visited an attraction. Among multiple methods of network analysis (Wasserman & Faust, 1994), a popular method uses the node degree, which indicates the number of neighbors (number of links) that every node has. An adaptation of this metric to bipartite networks can be found in Borgatti and Everett (1997). An extension of the node degree to weighted networks is the node strength, which is the sum of weights of all edges incident with the node (Opsahl, Agneessens, & Skvoretz, 2010). The node degree/strength distribution in the network reveals the most central nodes in terms of the number and relevance of relationships and, in general, how the network is structured. Methods to detect communities in graphs can be classified as those based on statistical inference (fitting a generative network model on the data), clustering optimization and dynamics (running dynamical processes on the network) (Fortunato & Hric, 2016). We use the methodology based on clustering optimization since it has been extensively utilized in other fields and includes many fast and freely available algorithms specifically adapted to unipartite and bipartite weighted networks. To our knowledge, this is not the case for the other methods. In general terms, the algorithm based on optimization selects the network partition that maximizes the difference between the observed connectivity among nodes in every cluster and the expected connectivity given by the null model. Here, the null model is represented by a network with the same number of nodes, edges, and degree distribution as in the observed network, but with randomized links. Specifically, in this paper, we use two approaches: 1. The probabilistic approach based on the analysis of statistical significance of clusters (Lancichinetti, Radicchi, & Ramasco, 2010). Specifically, we used the algorithm OSLOM (Lancichinetti, Radicchi, Ramasco, & Fortunato, 2011). In this algorithm, the clusters are generated independently; hence, some nodes may overlap (belong to multiple different clusters) whereas others may be

38

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

isolated (singletons that do not belong to any cluster). The latter nodes may be assigned to the most likely cluster disregarding the statistical significance. 2. The modularity optimization approach (Newman & Girvan, 2004). Modularity measures the difference between the connectivity in every cluster and the connectivity in a null model. In weighted bipartite networks, the modularity function takes the following form (Dormann & Strauss, 2014):

Q=

1 m

nC

si sj

∑ ∑ ⎛wij− m ⎞,

C=1

ij ∈ C





(1)

where W = {wij} is the weighted bipartite matrix of the real network, si is the strength of node i belonging to community C, m is the sum of weights in the matrix, m = ∑ wij, and nC is the number of communities. Inside the summation operator, the first term indicates the observed connectivity between the nodes and the second term indicates the expected connectivity (in a null model). Hence, the modularity optimization approach searches for a graph partition that maximizes function Q. Among several algorithmic realizations of this approach, the most popular ones are QuanBiMo (proposed by Dormann & Strauss, 2014) and LPAwb+/DIRT_LPAwb+ (Beckett, 2016); the latter algorithm is a generalization of an earlier algorithm (Liu & Murata, 2010) developed for unweighted bipartite networks and was demonstrated to have an efficiency superior to QuanBiMo (Beckett, 2016). Note that these optimization methods include stochastic components. Hence, it is customary to run the algorithm multiple times until the solution is stabilized. We chose to use OSLOM and DIRT_LPAwb+ for community detection in weighted unipartite and bipartite networks, respectively, since (1) both algorithms have been demonstrated to be more efficient than others (Beckett, 2016; Xie, Kelley, & Szymanski, 2013) and (2) they are freely available for download at www.oslom.org and as part of the R package “bipartite”, respectively. The modularity optimization approach does not include any statistical validation of the clustering. An a posteriori validation can be conducted by applying the algorithms to null models and comparing the results with those obtained with the real network. Statistical differences between both clustering results would reveal that the communities found in the real data are not an expected outcome of a random disposition of links. We conduct this comparison with both algorithms. Data description TripAdvisor lists 328 attractions in the area; they are mainly centered at St. Augustine (approximately 235 attractions), with secondary centers at Ponte Verde (unincorporated community in Jacksonville Beach area) on the North and Palm Coast area to the south, with approximately 40 attractions each. Out of the 328 attractions in the area, the top 30 (ordered by the number of reviews) are located either in St. Augustine or in close proximity. The most frequently reviewed attraction in St. Augustine is Castillo de San Marcos National Monument, which is the oldest masonry fort in the US, followed by the historical/shopping St. George Street, St. Augustine Distillery, and a city tour company (see Fig. 1B). TripAdvisor review data of St. Augustine attractions were collected utilizing a customized Python code in the following way. First, Florida visitors were identified from TripAdvisor reviews of Florida hotels. To reduce the possibility of collecting non-authentic UGC (such as reviews generated by the property managers pretending to be customers – see Trend, 2013), we selected only the reviewers who published at least 10 reviews (Feng, Xing, Gogar, & Choi, 2012) and collected their profile data. All reviews of places marked as “attractions” were collected for these reviewers. In total, the dataset contained 14,025 English language reviews published by 5513 reviewers from 2005 until February 2018. The majority of collected reviews (72%) were published in 2016–2018; only 6% of reviews were published prior to 2014. Fig. 1C presents the percentage of tourists/attractions that have posted/received a certain number of reviews. Using the network analysis terminology, the figure presents the degree distribution of the nodes in the tourist and attraction category. The group of ten attractions mapped in Fig. 1B received over 350 comments each, whereas the majority of attractions received fewer than 50 comments. Such heavy-tailed degree distributions are characteristic of real networks and usually fit a power-law probability distribution function (Albert & Barabási, 2002). This observation indicates that the researched destination has several star attractions that are clearly more visited than the rest. The distribution of attraction reviews is similar: half of the tourists (44%) reviewed only one attraction, and 22% reviewed two attractions. On the other extreme, two of the reviewers reviewed as many as 42 attractions. Meanwhile, half of the reviews (49%) Table 1 Number of reviews posted. Note that the total number of reviews is greater than the sum of the previous rows since some reviewers selected not to identify their place of residence or the place of residence was ambiguous/unidentifiable. Place

Number of reviewers

Number of reviews

Mean

Median

Florida US except Florida Countries except US Total

2456 1823 332 5513

6683 4568 761 14,025

2.7 2.5 2.3 2.5

2 2 1 2

39

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

Table 2 Demographic profile of reviewers (percentage of the total number of reviewers). Gender

Male Female Unknown/Other Total

Age 18–34

35–49

50–64

65+

Unknown

Total

1.2 2.6 0.2 4.0

3.9 6.8 0.3 11.0

6.7 8.1 0.5 15.3

3.4 1.6 0.1 5.0

3.3 5.1 56.3 64.7

18.4 24.2 57.4 100.0

came from visitors who reviewed three or fewer attractions. Interestingly, the mean and median numbers of reviewed attractions were similar for visitors coming from Florida, other US states, and visitors from other countries (Table 1). Based on this, we chose to treat the review database as a whole. The majority of the reviewers did not state their gender or age group (Table 2). For those who did, the age groups of 35–49 and 50–64 represented the majority (70% of reviewers who stated their age), and there was a somewhat higher percentage of females (60% of those who stated their gender). TripAdvisor allows reviewers to select one or more keywords that characterize a reviewer best from a list of 19 keywords. On average, a reviewer used 3.2 keywords (Table 3). To reduce the dimensionality, we joined keywords that tend to appear together with a hierarchical cluster analysis (see e.g., Romesburg, 2004). Average within-group linkage hierarchical clustering method was performed using an agglomerative (bottom-up) approach and phi linkage (an analog of Pearson’s correlation for binary data). Based on the results of the cluster analysis, we joined the categories “arts and architecture lover”, “history buff”, and “urban explorer” into a new category “urban”; the categories “ecotourist” and “nature lover” into “nature”, and the categories “nightlife seeker” and “trend setter” into “nightlife”. To confirm the robustness of the solution, we repeated clustering using the between-group linkage method and obtained the same result. Finally, we removed the smallest category, “vegetarian” (0.7% of the total category use), thereby reducing the number of categories to 13. Results Starting from TripAdvisor review data for attractions in St. Johns and Flagler Counties, we built three networks to study the interrelationships between tourists and attractions at the destination. The following two sections represent two analyses. First, in Analysis 1 (Section “A posteriori segmentation: Tourists-attractions network”), we consider tourists without attributing them to any particular group based on individual characteristics stated in their user profile (Table 1). The results, hence, represent an a posteriori segmentation of attractions, where attraction clusters are indicative of tourists’ preferred activities. Then, in Analysis 2 (Section “Mixed a priori-a posteriori segmentation”), we consider tourists belonging to groups defined a priori based on their declared interests and age. The results, therefore, represent a mixed a priori-a posteriori approach in the segmentation typology (Dolničar, 2004). Note that Analysis 2 incorporates a smaller volume of data since not all of the reviewers elect to make their interests or age public. The two analyses complement each other in segmenting tourists to the same destination. In Section 4.3, we validate the obtained networks by comparing them with those generated randomly. A posteriori segmentation: tourists-attractions network In this Analysis 1, we build a unipartite network represented by attraction nodes with connections formed by the attraction reviews published on tripadvisor.com. That is, an edge between two attraction nodes appears when a tourist has reviewed both attractions. The number of tourists who have commented on both attractions is represented by the edge's weight. Note that one tourist can review same attraction multiple times; in our sample, such duplicate links constitute only 1.8% of the total number of edges and are excluded from consideration. In this network, the characteristics of the tourists, such as their age, are not accounted for. Instead, we concentrate on researching a common pattern of visitation. The identified communities represent the clusters of Table 3 Reviewers' travel interest (percentage of the total number of checked interest categories). Note that one reviewer may have multiple interests (mean = 4.4); percentage points were normalized to add to 100%. Interest Category

%

Interest Category

%

Interest Category

%

Foodie Beach Goer Like a Local Nature Lover Family Vacationer History Buff

10.0 9.5 8.7 8.6 8.2 8.0

Peace and Quiet Seeker Thrifty Traveler 60+ Traveler Urban Explorer Luxury Traveler Art and Architecture Lover

7.4 7.1 6.2 4.3 4.3 4.1

Thrill Seeker Shopping Fanatic Ecotourist Nightlife Seeker Trendsetter Backpacker Vegetarian

3.9 3.0 2.0 1.6 1.2 1.1 0.7

40

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

Table 4 List of the 20 most commented attractions in St. Augustine sample. Degree constitutes the percentage of the total number of reviewers. Name

N reviews

Degree

Name

N reviews

Degree

Castillo de San Marcos St. George Street St. Augustine Distillery Old Town Trolley Tours Lighthouse & Maritime Museum Flagler College St. Augustine Beach Alligator Farm Zoo Park The Lightner Museum Fountain of Youth Archeol. Park

1431 908 852 816 797 593 581 413 412 374

0.261 0.165 0.155 0.149 0.145 0.108 0.106 0.075 0.075 0.068

Pirate & Treasure Museum Old Jail Cathedral Basilica Red Train Tours San Sebastian Winery Fort Matanzas NM Ripley’s Believe It Or Not Old City Vilano Beach Whetstone Chocolates

329 315 268 243 243 240 233 199 175 141

0.060 0.057 0.049 0.044 0.044 0.044 0.042 0.036 0.032 0.026

attractions commonly visited by a certain (not identified) segment of tourists. The constructed attraction network is disconnected; that is, some attractions (6.4% of the sample) are isolated from the rest of the network. These attractions are relatively unpopular and were reviewed by few tourists. Since such attractions represent negligible points of interests in the destination, we remove them from consideration, reducing the sample to 307 attractions and 5490 reviewers. This reduced network has 14,004 edges. As described in Section 3.3, the distribution of attraction popularity, defined as the number of reviews, is heavy tailed, which is typical of destinations with few star attractions (listed in Table 4). These star attractions are mostly located in St. Augustine and its vicinity and form the core of the most popular places and activities in the destination. The top attraction, Castillo de San Marcos, was reviewed by over 1/4 of all reviewers. The OSLOM algorithm detected seven partially overlapping communities represented by clusters of attractions in the network, with sizes of 152, 69, 28, 28, 13, 13, and 7 (see Table 5). To ensure the robustness of the solution, we calculated multiple solutions using variable parameters of the algorithm and obtained similar results (Supplementary Material A). Cluster 1 is the largest (over 50% of all attractions) and includes the most popular attraction. Moreover, it accumulates 3/4 of the total number of reviews (74.7%). The other six clusters mainly represent groups of secondary attractions; note, however, that three the most visited attractions in cluster 2 are shared with cluster 1. These overlapping nodes indicate “bridging” points of different groups of preferential visits followed by tourists. Fig. 2 shows the geographical location of the four major clusters. Note that clusters 1 and 3 are mainly located in St. Augustine, whereas clusters 2 and 4 include the attractions located in Palm Coast and Ponte Vedra, respectively. Therefore, there is a certain correspondence of clusters and geographical location, pointing to the existence of differentiated tourist areas in the destination. To investigate whether the seven observed clusters represent an ordered disposition of the relationships among tourists and attractions and are unlikely to result from random interactions, the discovered communities of tourists/attractions were compared with those obtained using a null model. The null model used the same number of nodes (tourist and attractions) and degree distribution in both categories but randomizes the network connectivity. That is, the null model represented tourists randomly selecting attraction visitations. The comparison clearly demonstrated the difference between the observed network and the null model, where the OSLOM algorithm was able to identify a much smaller number of communities (Fig. 3). Similar results were obtained using modularity computed for the null models and observed data (Fig. 3). Modularity was computed following Shen, Cheng, Cai, and Hu (2009) as a modification of expression (1), adapted to clusters with overlapping. Observe that the expected modularity remains close to zero in all simulations. See Supplementary Material C for the analysis of the statistical significance of observed clusters.

Table 5 List of the top ten most commented attractions in every cluster. N indicates the cluster size. In bold: The three common attractions in cluster 1 and 2. Note that the Fort Matanza’s NM ranking in Cluster 1 is lower than 10. Three small clusters (N < =13) are not shown for clarity. Cluster 1 (N = 152)

Cluster 2 (N = 69)

Cluster 3 (N = 28)

Cluster 4 (N = 28)

Castillo de San Marcos St. George Street St. Augustine Distillery Old Town Trolley Tours of St. Augustine St. Augustine Lighthouse & Maritime Museum Flagler College St. Augustine Beach St. Augustine Alligator Farm Zoological Park The Lightner Museum Fountain of Youth Archaeological Park

Flagler College The Lightner Museum Fort Matanzas National Monument Washington Oaks Gardens State Park Marineland Dolphin Adventure Schooner Freedom The St. Augustine Amphitheater Flagler Beach Municipal Pier European village Gamble Rogers Memorial State Recreation Area

Old St. Augustine Village St. Augustine Gold Tours St. Augustine Premium Outlets Fiesta Falls Miniature Golf St Augustine Visitor Information Center The Ice Plant St. Augustine Outlets The Oldest House Museum Complex Crescent Beach El Galeon

TPC at Sawgrass Stadium Course The Ximenez-Fatio House Mickler's Landing Beach The Spa at Ponte Vedra Inn & Club St. Augustine Aquarium Old City Farmers Market Ponte Vedra Concert Hall TPC at Sawgrass Valley Course Ponte Vedra Beach Serenata Beach Club

41

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

Jacksonville Duval

Villano Beach

St. Johns Clay

St. Augustine

St. Augustine

Palatka Communities Palm Coast

1 2 Putnam

Flagler

St. Augustine South

3

4 Crescent City

St. Augustine Beach

Volusia

Fig. 2. St. Augustine’s four largest communities of attractions (those with at least 28 members).

Fig. 3. Number of communities and modularity obtained using 200 null model trials. The dashed line shows the number of communities for the empirical network. The observed modularity is 0.0775.

Mixed a priori-a posteriori segmentation Whereas in the previous section we considered all reviewers in the same way, this Analysis 2 differentiates between tourist groups. Based on the data, three ways of segmenting tourists into groups are feasible: by interests, by age, and by gender. The analysis of gender did not reveal a meaningful clustering and was excluded from this paper due to space limitations. Hence, two mixed segmentations analyses were performed: in the first one, the groups were defined by tourist interests and in the second one, groups were defined by the tourists' age groups. Both identifications were extracted from the TripAdvisor user’s profile (see Tables 2 and 3). Specifically, we built two bipartite networks: a) Tourist interests and attractions and b) age groups and attractions. Note that, similarly to the previous section, the edges between the tourist groups and attractions in both networks have weights, which indicate the number of tourists in every group reviewing the particular attractions. Note also that the tourists tend to indicate multiple interests; these tourists belong to multiple groups. The tourists who indicated no interests (or did not indicate their age in the second analysis) were removed. 42

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

Table 6 Clusters of interests-attractions in St. Augustine. N indicates the number of attractions in the cluster. In each cluster, only the 10 most commented attractions are shown. Cluster 1 (N = 126)

Cluster 2 (N = 93)

Cluster 3 (N = 85)

Interests

Attractions

Interests

Attractions

Interests

Attractions

Family vacationer Beach goer Nature Thrifty traveler

Castillo de San Marcos St. Augustine Lighthouse & Maritime Museum St. Augustine Beach St. Augustine Alligator Farm Zoological Park St. Augustine Pirate & Treasure Museum Ripley's Believe It Or Not Vilano Beach Anastasia State Park Whetstone Chocolates Oldest Wooden Schoolhouse

Like a local Peace and quiet seeker Urban 60+ traveler

St. George Street Old Town Trolley Tours of St. Augustine Flagler College The Lightner Museum Cathedral Basilica of St. Augustine Red Train Tours Fort Matanzas National Monument Old City Mission of Nombre de Dios The Oldest Store Museum

Luxury traveler Foodie Thrill seeker Shopping Fanatic Nightlife

St. Augustine Distillery Fountain of Youth Archaeological Park Old Jail San Sebastian Winery World Golf Hall of Fame The Tini Martini Bar Ghosts and Gravestones St. Augustine Frightseeing Tour Old St. Augustine Village Schooner Freedom The St. Augustine Amphitheater

Interests-attractions network We considered 41,966 reviews of 304 attractions published by 13 groups of tourists, identified by their interests as discussed in Section 3.3. The DIRT_LPAwb+ algorithm (Section “Data description”) recognized three differentiated interests/attractions communities (Table 6). Similar to the previous section, we ensured the robustness of the solution by repetitive application of the clustering algorithm, varying its parameters (see Supplementary Material B.1 for details). Notice that, distinctively from a posteriori segmentation analysis, each of the clusters contains some of the most popular attractions (see Table 4). Every cluster of interests can be collectively interpreted as a determined type of tourist. The first group includes tourists who travel with a family, which frequently includes beach and nature destinations and a limited budget. The second group is better represented by those more interested in local urban attractions; these tourists also tend to be more senior. The third group includes those tourists looking for food, thrills, shopping and nightlife experiences. The clusters relate these types of tourists to attractions they generally enjoy at the specific destinations. Age-attractions network Similar to Section “Interests-attractions network”, the clustering algorithm was applied to 5085 reviews of 247 attractions published by 4 groups of tourists, identified by their age group. The DIRT_LPAwb+ algorithm identified three communities formed by three consecutive age groups: 18–49, 50–64 and 65+ years old (Table 7). Notice that the list of attractions visited by the 65+ age group is similar to those identified for the second group of tourists in the previous subsection (Interests-attractions network), which also included more senior tourists: 70% of the ten most visited attractions were identical; this additionally demonstrates the robustness of the analysis. Notably, among the attractions visited by the first and second age groups, the younger group mostly visited the main attraction and thematic parks, whereas the middle-aged group enjoyed museums, beaches and bars/wineries. Comparisons with the null models To investigate whether the observed clusters differ from those appearing by chance, we compared the observed clustering with those obtained using the null model (see Section “A posteriori segmentation: Tourists-attractions network”). The comparison of the Table 7 Clusters of age group-attractions in St. Augustine. N indicates the number of attractions in the cluster. Cluster 1 (N = 125)

Cluster 2 (N = 51)

Cluster 3 (N = 71)

Age

Attractions

Age

Attractions

Age

Attractions

18–34, 35–49

Castillo de San Marcos St. Augustine Alligator Farm Zoological Park Fountain of Youth Archaeological Park St. Augustine Pirate & Treasure Museum Fort Matanzas National Monument Ripley's Believe It Or Not Vilano Beach Mission of Nombre de Dios Anastasia State Park Whetstone Chocolates

50–64

St. Augustine Distillery St. Augustine Lighthouse & Maritime Museum St. Augustine Beach Red Train Tours San Sebastian Winery The Tini Martini Bar Washington Oaks Gardens State Park St. Augustine Premium Outlets St Augustine Visitor Information Center St. Augustine Outlets

65+

St. George Street Old Town Trolley Tours of St. Augustine Flagler College The Lightner Museum Old Jail Cathedral Basilica of St. Augustine Old City World Golf Hall of Fame The Oldest Store Museum Memorial Presbyterian Church

43

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

Fig. 4. Number of communities and modularity (Q) obtained for 200 null model trials: Left, Interests-attractions network (observed modularity Q = 0.0392); Right, Age groups-attractions network (observed modularity Q = 0.1056).

observed number of communities and modularity and those obtained from 200 runs of DIRT_LPAwb+ algorithm using the null model showed that the identified communities are unlikely to be formed by chance (Fig. 4). Notice that for both segmentations, the null model tends to produce more communities (the maximum in case of age groups) and lower modularity compared to the observed network. Supplementary Materials B.2 and C include the analysis of the statistical significance of observed clusters and a comparison of other indicators, demonstrating that the observed communities are unlikely to be formed by chance. Discussion One contribution of this study to tourism networks research is that it is conducted from a tourist perspective rather than emphasizing the role of various business, organizational, and political entities at a destination; thus, this study adds to a presently very small group of studies. The authors take UGC as proxy data for actual behavior at a destination, circumventing limitations and restrictions associated with surveying human subjects. Utilizing UGC as a data source for complex analyses is an underexplored research mode in the tourism network literature and, thus, another contribution of the study. The data, however, are still a sample, albeit a very large one, of the target population of destination visitors, as not everyone leaves a review about their destination experience (compare ∼50,000 reviews studied vs. ∼6,000,000 annual visits to the area). Determining whether UGC data are biased toward certain demographic and tourist groups requires a thorough investigation by tourism scholars, as the usage of UGC data for research is rapidly expanding. Another contribution of the study is its novel take on a long-standing methodology of tourist segmentation, with network analysis rather than multivariate statistical methods playing the central role. The demonstrated method falls in the fundamental segmentation taxonomy, contrasting an a posteriori approach with a mixed a priori-a posteriori design. Following Baggio and Sainaghi (2016), the authors uphold that the processes underlying human behavior and relationships between people, places, and experiences might require nonlinear models. This study provides a convincing example of the feasibility of network analysis for tourist segmentation, as it results in distinctive, sufficiently large, and accessible clusters of attractions that are inherently connected via tourist visitations and interests, as well as geographically. Knowledge about attraction clusters that are of greater interest to a particular group of visitors provides opportunities for better packaging and programming of visitors’ leisure time at a destination and is conducive to developing communication and copromotional efforts by DMOs. In Analysis 1, the attraction clusters emerge via tourist reviews acting as a proxy for actual visitation behavior. The solution identifies four quite large communities of attractions; however, surmising tourist interests from the attractions’ names is not a clearcut process. The geographical aspect provides spatial distribution of observed clusters (Fig. 2) and, therefore, gives an indication of which attractions can be copromoted, copackaged, and coprogrammed by respective DMOs. Analysis 2 results in two networks: interests-attractions and ages-attractions, each with three large attraction clusters. In the interests-attractions network, attraction clusters for family vacationers and nature lovers, older urban travelers seeking peace and quiet, and those looking for thrills, good food and drinks as well as an attractive nightlife scene are clearly separated. This solution is considered the most useful from a marketing standpoint, especially, if used in combination with age information, as attractions in respective clusters can also be described by what tourist age group they primarily serve. By itself, the ages-attractions segmentation network is less informative, in part because the two youngest groups (18–24 and 25–34) were joined together to have a comparable number of network nodes. Needless to say, all attraction clusters can be spatially visualized for the a priori-a posteriori networks (Fig. 2 and Supplementary Material B.2). Although findings from both analyses can stimulate the creation of organizational networks to adapt and coordinate the supply to tourists’ interests, a priori-a posteriori segmentation, in the authors’ view, provides the most possibilities for tailored communications between destinations and their actual and potential visitors. Methodological considerations and future research To provide a balanced appreciation of the study findings, we would like to point out a few methodological considerations. First, 44

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

the data collection design was somewhat biased against infrequent visitors to Florida coming prior to 2016. That is, first, 2016–February 2018 Florida visitor data were identified through a TripAdvisor search; then, the data on all Florida trips were collected for these visitors. For example, data on a tourist last visiting Florida in 2015 were not collected; however, data on a tourist visiting Florida in 2015 and 2016 were collected. The data for 2016–2018 (72% of the dataset) were not biased. To investigate the extent to which this data collection design affected the findings, we repeated Analysis 1 using only the data for 2016–2017 and found no difference in outcomes. We decided against running Analysis 2 with the reduced data due to the enhanced data requirements of bipartite analysis. We allowed overlapping in the identification of attraction clusters in Analysis 1. Although there are methods that do not allow overlapping for unipartite networks, we included this possibility to identify common attractions belonging to several clusters. This is a natural outcome of an unrestricted segmentation analysis, where differentiated types of tourists can share some of their preferred attractions. By using these methods, we can find attractions that “bridge” two or more market segments. The overlapping option seems especially relevant for the case at hand, as the study area (Fig. 1) has star attractions that might be interesting to a wide variety of visitors. The methodology employed for the mixed segmentation (Analysis 2) is based on the maximization of the modularity function. Note that the observed modularity is lower than those reported in comparable studies (between 0.3 and 0.6, e.g., Beckett, 2016). Recall that positive modularity values Q are observed when the number of edges within the groups exceeds those expected by chance, with a maximum of Qmax (Qmax < 1) depending on the data distribution, such as the size and the number of clusters in a particular network (Good, de Montjoye, & Clauset, 2010). Hence, low modularity can be explained by the data distribution, that is, the presence of star attractions at the destination: these star attractions frequently receive comments from tourists from all clusters, which reduces the network modularity. A different clustering algorithm allowing for group overlapping might overcome this methodological limitation. However, we are not aware of any established clustering algorithm that deals with overlapping in weighted bipartite networks. Advances in this field would be very valuable for segmentation analysis in tourism using network analysis. This brings us to the issue of model robustness. Partially, this question was answered by model selection from the list of the approaches already successfully used in similar applications. We also performed additional model computations under different values of model parameters (Supplementary Material A and B.1). Additionally, the results obtained in the two parts of Analysis 2 are consistent. However, a question arises: how would the results change if another network analysis method were applied? Considering that a model is a simplified representation of objective reality, different parametrizations would give different views of this reality that are not necessarily consistent. This is a common problem in science; large international undertakings such as the Coupled Model Intercomparison Project (CMIP) of the World Climate Research Programme have attempted to tackle it by comparing the outcomes of many dozens of independently developed models, with somewhat promising results. This approach, however, requires resources not feasible in the area of tourism research. However, since the abovementioned attempt fundamentally works with one object (the Earth’s climate), it might work with another object, in our case, tourist attractions. We believe that the accumulation of research findings using a variety of destination types will eventually result in a set of “golden standard” methods tested in multiple settings. Several directions for future research on network analysis segmentation with UGC emerged while working on the current study that can be pursued with existing quantitative instruments. For one, several types of network nodes can be considered to describe tourist preferences at a destination. Questions such as “which attractions does a particular group of hotels serve?” and “which restaurants are preferred by visitors of particular hotels?” could lead to analyses that provide useful information for destinations and hospitality managers. Another promising direction would be to study tourist networks dynamically, which might be useful when a destination is undergoing bursting development or rapid decline. The demonstrated approach, in our view, has the potential to be used to study tourist response to natural disasters, acts of terrorism, or crime epidemics at a destination from the perspective of perceived safety, specifically, the consumption dynamics of hospitality services in the affected areas by locals and tourists. In conclusion, this study contributes to the tourist segmentation literature, both theoretically and method-wise, by proposing and demonstrating an approach to segmenting tourists using network analysis with user-generated content. One reported solution is the tourists-attractions network, which represents a case of an a posteriori segmentation based on tourists’ behavior recorded by tourists in their online reviews of destination attractions. This solution identifies four large clusters of attractions with notable geographical separation in the destination area. The second segmentation solution utilizes a mixed segmentation approach in which a priori information about tourists (in this case, interests and age) is used in a combination with a posteriori visitation information. Both networks have three large attraction clusters. From the standpoint of destination management, the interests-attractions network is the most informative solution out of the three reported in this paper. A number of cross-checks are employed to validate the findings, such as comparisons against randomly simulated networks, geovisualization of discovered attraction communities, and additional network analysis algorithms, to ensure the robustness of the solution and confidence in the reported findings. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.annals.2018.09. 002. References Albert, R., & Barabási, A.-L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74, 47. Andreu, L., Kozak, M., Avci, N., & Cifter, N. (2006). Market segmentation by motivations to travel: British tourists visiting Turkey. Journal of Travel & Tourism

45

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

Marketing, 19, 1–14. Asero, V., Gozzo, S., & Tomaselli, V. (2016). Building tourism networks through tourist mobility. Journal of Travel Research, 55, 751–763. Baggio, R. (2011). Collaboration and cooperation in a tourism destination: A network science approach. Current Issues in Tourism, 14, 183–189. Baggio, R., & Sainaghi, R. (2016). Mapping time series into networks as a tool to assess the complex dynamics of tourism systems. Tourism Management, 54, 23–33. Beckett, S. J. (2016). Improved community detection in weighted bipartite networks. Royal Society Open Science, 3, 140536. Bedi, P., & Sharma, C. (2016). Community detection in social networks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 6, 115–135. Borgatti, S. P., & Everett, M. G. (1997). Network analysis of 2-mode data. Social Networks, 19, 243–269. Choi, W. M., & Tsang, C. K. L. (2000). Activity based segmentation on pleasure travel market of Hong Kong private housing residents. Journal of Travel & Tourism Marketing, 8, 75–97. David-Negre, T., Hernández, J. M., & Moreno-Gil, S. (2018). Understanding tourists’ leisure expenditure at the destination: A social network analysis. Journal of Travel & Tourism Marketing, 1–16. Dolnicar, S. (2004). Insights into sustainable tourists in Austria: A data-based a priori segmentation approach. Journal of Sustainable Tourism, 12, 209–218. Dolničar, S. (2004). Beyond “commonsense segmentation”: A systematics of segmentation approaches in tourism. Journal of Travel Research, 42, 244–250. Dolnicar, S. (2005). Understanding barriers to leisure travel: Tourist fears as a marketing basis. Journal of Vacation Marketing, 11, 197–208. Dolnicar, S. (2008). Market segmentation in tourism. Tourism Management, Analysis, Behaviour and Strategy, 129–150. Dormann, C. F., & Strauss, R. (2014). A method for detecting modules in quantitative bipartite networks. Methods in Ecology and Evolution, 5, 90–98. Feng, S., Xing, L., Gogar, A., & Choi, Y. (2012). Distributional footprints of deceptive product reviews. ICWSM, 12, 98–105. Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486, 75–174. Fortunato, S., & Hric, D. (2016). Community detection in networks: A user guide. Physics Reports, 659, 1–44. Frochot, I., & Morrison, A. M. (2000). Benefit segmentation: A review of its applications to travel and tourism research. Journal of Travel & Tourism Marketing, 9, 21–45. Good, B. H., de Montjoye, Y.-A., & Clauset, A. (2010). Performance of modularity maximization in practical contexts. Physical Review E, 81, 046106. Gretzel, U., Werthner, H., Koo, C., & Lamsfus, C. (2015). Conceptual foundations for understanding smart tourism ecosystems. Computers in Human Behavior, 50, 558–563. Hsieh, S., O’Leary, J. T., & Morrison, A. M. (1992). Segmenting the international travel market by activity. Tourism Management, 13, 209–223. Hsu, F.-M., Lin, Y.-T., & Ho, T.-K. (2012). Design and implementation of an intelligent recommendation system for tourist attractions: The integration of EBM model, Bayesian network and Google Maps. Expert Systems with Applications, 39, 3257–3264. Huang, Y., & Bian, L. (2009). A Bayesian network and analytic hierarchy process based personalized recommendations for tourist attractions over the Internet. Expert Systems with Applications, 36, 933–943. Jamal, T., Smith, B., & Watson, E. (2008). Ranking, rating and scoring of tourism journals: Interdisciplinary challenges and innovations. Tourism Management, 29, 66–78. Kastenholz, E., Davis, D., & Paul, G. (1999). Segmenting tourism in rural areas: The case of North and Central Portugal. Journal of Travel research, 37, 353–363. Kidd, J. N., King, B. E., & Whitelaw, P. A. (2004). A profile of farmstay visitors in Victoria, Australia and preliminary activity-based segmentation. Journal of Hospitality & Leisure Marketing, 11, 45–64. Kirilenko, A., Stepchenkova, S., 2018. Tourism research from its inception to present day: A data mining approach. https://doi.org/10.13140/RG.2.2.23586.32962. Lancichinetti, A., Radicchi, F., & Ramasco, J. J. (2010). Statistical significance of communities in networks. Physical Review E, 81, 046110. Lancichinetti, A., Radicchi, F., Ramasco, J. J., & Fortunato, S. (2011). Finding statistically significant communities in networks. PLoS One, 6, e18961. Library of Congress. (2018). Dr. Martin Luther King Jr., behind bars in jail in St. Augustine, Florida [WWW Document]. http://www.loc.gov/pictures/item/96516146/ (accessed 5.27.18). Liu, B., Huang, S. S., & Fu, H. (2017). An application of network analysis on tourist attractions: The case of Xinjiang, China. Tourism Management, 58, 132–141. Liu, X., & Murata, T. (2010). An efficient algorithm for optimizing bipartite modularity in bipartite networks. Journal of Advanced Computational Intelligence and Intelligent Informatics, 14, 408–415. Marine-Roig, E., & Clavé, S. A. (2015). Tourism analytics with massive user-generated content: A case study of Barcelona. Journal of Destination Marketing & Management, 4, 162–172. Mazanec, J. A. (1992). Classifying tourists into market segments: A neural network approach. Journal of Travel & Tourism Marketing, 1, 39–60. McKercher, B., Ho, P. S., Cros, H. D., & So-Ming, B. C. (2002). Activities-based segmentation of the cultural tourism market. Journal of Travel & Tourism Marketing, 12, 23–46. Morrison, A., Lynch, P., & Johns, N. (2004). International tourism networks. International Journal of Contemporary Hospitality Management, 16, 197–202. Moscardo, G., Pearce, P., Morrison, A., Green, D., & O’leary, J. T. (2000). Developing a typology for understanding visiting friends and relatives markets. Journal of Travel Research, 38, 251–259. Mumuni, A. G., & Mansour, M. (2014). Activity-based segmentation of the outbound leisure tourism market of Saudi Arabia. Journal of Vacation Marketing, 20, 239–252. National Park Service. (2018). St. Augustine Town Plan Historic District [WWW Document]. https://www.nps.gov/nr//travel/geo-flor/24.htm (accessed 5.25.18). Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69, 026113. Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks, 32, 245–251. Pantano, E., Priporas, C.-V., & Stylos, N. (2017). ‘You will like it!’ Using open data to predict tourists’ response to a tourist attraction. Tourism Management, 60, 430–438. Park, D.-B., & Yoon, Y.-S. (2009). Segmentation by motivation in rural tourism: A Korean case study. Tourism Management, 30, 99–108. Pavlovich, K. (2003). The evolution and transformation of a tourism destination network: The Waitomo Caves, New Zealand. Tourism Management, 24, 203–216. Romesburg, C. (2004). Cluster analysis for researchers. Lulu.com. Scott, J. (2012). What is social network analysis? A&C Black. Shen, H., Cheng, X., Cai, K., & Hu, M.-B. (2009). Detect overlapping and hierarchical community structure in networks. Physica A: Statistical Mechanics and Its Applications, 388, 1706–1712. Shih, H.-Y. (2006). Network characteristics of drive tourism destinations: An application of network analysis in tourism. Tourism Management, 27, 1029–1039. Smith, W. R. (1956). Product differentiation and market segmentation as alternative marketing strategies. Journal of marketing, 21, 3–8. St Augustine, Ponte Vedra, & The Beaches Visitors and Convention Bureau. (2015). St. Augustine, Florida to celebrate 450th anniversary in 2015 [WWW Document]. http://www.floridashistoriccoast.com/sites/default/master/files/pdfs/faq_450th.pdf (accessed 5.22.15). Tinsley, R., & Lynch, P. (2001). Small tourism business networks and destination development. International Journal of Hospitality Management, 20, 367–378. Tran, M. T., Jeeva, A. S., & Pourabedin, Z. (2016). Social network analysis in tourism services distribution channels. Tourism Management Perspectives, 18, 59–67. Tremblay, P. (1998). The economic organization of tourism. Annals of Tourism Research, 25, 837–859. Trend, N. (2013). TripAdvisor and the issue of trust. The Telegraph. van der Zee, E., & Vanneste, D. (2015). Tourism networks unravelled; a review of the literature on networks in tourism management studies. Tourism Management Perspectives, 15, 46–56. Vermeulen, I. E., & Seegers, D. (2009). Tried and tested: The impact of online hotel reviews on consumer consideration. Tourism management, 30, 123–127. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge university press. Williams, S. A., Terras, M. M., & Warwick, C. (2013). What do people study when they study Twitter? Classifying Twitter related academic papers. Journal of Documentation, 69, 384–410. Xiang, Z., & Gretzel, U. (2010). Role of social media in online travel information search. Tourism Management, 31, 179–188. Xie, J., Kelley, S., & Szymanski, B. K. (2013). Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys (csur), 45, 43.

46

Annals of Tourism Research 73 (2018) 35–47

J.M. Hernández et al.

Yuan, Y., Gretzel, U., & Tseng, Y.-H. (2015). Revealing the nature of contemporary tourism research: Extracting common subject areas through bibliographic coupling. International Journal of Tourism Research, 17, 417–431. Zhang, Z., Ye, Q., Law, R., & Li, Y. (2010). The impact of e-word-of-mouth on the online popularity of restaurants: A comparison of consumer reviews and editor reviews. International Journal of Hospitality Management, 29, 694–700.

Dr. Juan M. Hernandez ([email protected]) is Associate Professor University of Las Palmas de Gran Canaria, Institute of Tourism and Sustainable Economic Development (TIDES), Spain. His main line of research has been the application of dynamical models to represent complex systems in nature-based industries, such as tourism and fisheries. Recently, he is particularly interested in the application of the social network analysis to tourism, from the theoretical and empirical point of view.

Dr. Andrei P. Kirilenko (andrei.kirilenko@ufl.edu) is Associate Professor in the Dept. of Tourism, Recreation and Sport Management at the University of Florida. He received his Ph.D. in Computer Science and held positions at the Center for Ecology & Forest Productivity, Russia, European Forest Institute, Finland, U.S. Environmental Protection Agency laboratory, OR, Purdue University and University of North Dakota. His research interests include big data analysis, data mining, tourism analytics, climate change impacts, and sustainability issues.

Dr. Svetlana Stepchenkova (svetlana.step@ufl.edu) is Associate Professor at the Dept. of Tourism, Recreation and Sport Management at the University of Florida. Her research interests are in the area of destination marketing, branding, and positive image building. She studies tourism behavior and the effectiveness of destination promotion in situations of strained bilateral relations between nations. She is also interested in usability of user-generated content for managerial decision making in destination marketing.

47