Accepted Manuscript Analyzing spatiotemporal trends in social media data via smoothing spline analysis of variance Nathaniel E. Helwig, Yizhao Gao, Shaowen Wang, Ping Ma PII: DOI: Reference:
S2211-6753(15)00076-7 http://dx.doi.org/10.1016/j.spasta.2015.09.002 SPASTA 128
To appear in:
Spatial Statistics
Received date: 25 March 2015 Accepted date: 2 September 2015 Please cite this article as: Helwig, N.E., Gao, Y., Wang, S., Ma, P., Analyzing spatiotemporal trends in social media data via smoothing spline analysis of variance. Spatial Statistics (2015), http://dx.doi.org/10.1016/j.spasta.2015.09.002 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
*Manuscript Click here to view linked References
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Analyzing spatiotemporal trends in social media data via smoothing spline analysis of variance Nathaniel E. Helwiga,b,∗, Yizhao Gaoc , Shaowen Wangc,d , Ping Mae Department of Psychology, University of Minnesota, Minneapolis, MN, 55455-0366 b School of Statistics, University of Minnesota, Minneapolis, MN, 55455-0493 c Geography and Geographic Information Science, University of Illinois, Champaign, IL, 61820-6371 d National Center for Supercomputing Applications, University of Illinois, Urbana, IL, 61801-2311 e Department of Statistics, University of Georgia, Athens, GA, 30602-5029 a
Abstract Social media have become an integral part of life for many individuals, and social media websites generate incredible amounts of data on a variety of societal topics. Furthermore, some social media posts contain geolocation information, so social media data can be viewed as a spatiotemporal phenomenon. To understand spatiotemporal trends in ultra-large sample social media data, we propose a novel application of the Smoothing Spline Analysis of Variance (SSANOVA) framework, which is a nonparametric approach capable of discovering latent functional relationships in noisy data. Unlike currently available approaches, our proposed SSANOVA framework (a) makes few assumptions about the nature of the spatiotemporal trend, (b) provides a mean of assessing the uncertainty of the estimated spatiotemporal trend, and (c) is scalable to analyze massive samples of social media data. To demonstrate the potential of our approach, we model the daily spatiotemporal Twitter trend in the United States. Our results reveal that the proposed SSANOVA approach can provide accurate and informative estimates of spatiotemporal social media trends, as well as useful information about the precision of the estimated spatiotemporal trends. Keywords: Smoothing spline, Social media, Spatial smoothing, Spatiotemporal smoothing
Corresponding author Email addresses:
[email protected] (Nathaniel E. Helwig),
[email protected] (Yizhao Gao),
[email protected] (Shaowen Wang),
[email protected] (Ping Ma) ∗
Preprint submitted to Spatial Statistics
July 22, 2015
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
1. Introduction Social media (e.g., Facebook, Twitter, Flickr) have become a social fabric of our society. For example, as of 2014, Facebook passes 1.23 billion monthly active users, or more than 15% of the global population. By simplifying the sharing and dissemination of user-generated content, social media have changed the way individual-level information is generated, distributed, and exchanged (Kaplan and Haenlein, 2010). Massive streams of social media data provide alternatives to traditional data collection approaches like questionnaires or interviews for understanding people’s opinions and observations (Lampos, 2012) and, thus, are increasingly investigated by researchers in many domains. For example, social media have been used for forecasting box-office revenues for movies (Asur and Huberman, 2010), predicting stock market (Bollen et al., 2011) and election results (Tsou et al., 2013), and estimating influenza activities (Achrekar et al., 2011; Corley et al., 2009; Culotta, 2010a,b; Lampos and Cristianini, 2010, 2012; Padmanabhan et al., 2014; Signorini et al., 2011). Massive numbers of social media users are engaged at any moment to view and generate content, and publish where they are along with the content they generate (Wang et al., 2012). Hence social media can be regarded as a major spatiotemporal data source. Twitter, for example, introduced location-based services in 2010, which has opened new windows for studying spatiotemporal trends in social media data. In particular, the spatiotemporal characteristics of social media data have been used to study people’s mobility patterns (Cho et al., 2011), the transmission of disease (Sadilek et al., 2012), and the detection of events (Cheng and Wicks, 2014; Lee and Sumiya, 2010). Another key characteristic of social media is that they are accessible to public. Many social media services provide application programming interfaces (APIs) so that one can directly gain a large set of social media content, making social media data an attractive spatiotemporal data source. Classic statistical methodologies are not equipped to model trends in massive spatiotemporal data sets (Fan et al., 2014) so the statistical modeling of social media data poses a difficult task. A variety of approaches have been proposed for modeling trends in social media data (e.g., Cheng and Wicks, 2014; Cho et al., 2011; Lee and Sumiya, 2010; Sadilek
2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
et al., 2012). However, few of the proposed approaches are easily applicable and/or extendable to analyzing different types of social media data. Furthermore, many of the proposed approaches focus on obtaining point estimates of the spatiotemporal trend, without providing a means of assessing the certainty of the estimate. For example, classic visualization tools such as histograms and kernel density estimates (KDEs), which are popular in these studies, do not offer information about the precision of the estimated trend. In this paper, we propose a novel application of Smoothing Spline Analysis of Variance (SSANOVA), which is a nonparametric statistical framework for modeling functional relationships in noisy data. Using recent computational developments (Helwig, 2013; Helwig and Ma, 2015, in press; Ma et al., 2015), the SSANOVA approach provides a powerful and practical alternative to classic parametric modeling approaches. Furthermore, given the massive social media sample sizes and the asymptotic properties of the SSANOVA estimator, the SSANOVA approach has incredible potential for discovering genuine latent trends in social media data. In addition, the Bayesian interpretation of the smoothing spline (Wahba, 1983; Kim and Gu, 2004) makes it possible to assess the precision of the spatiotemporal trend at any point throughout the observed space-time domain. As we demonstrate in the following pages, our proposed SSANOVA framework offers a flexible and statistically rigorous approach that can be applied to many social media data analysis problems. 2. Nonparametric Analysis of Social Media 2.1. Background on Social Media Analysis Social media analysis begins with some semantic scanning of the social media post. Using Twitter and movie box-office revenues as an example, one would scan the 140 characters of each individual tweet using classification rules (e.g., existence of words in movie titles, hashtag of the star actors, tweet mood, etc.). Each tweet is then assigned either a binary classification (1=movie tweet, 0=other) or a probabilistic classification (p is probability of movie tweet). Taking the geolocation and time stamp information into account, it is possible to obtain the count (or probability) of the number of tweets about a particular movie in any
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
given spatiotemporal region of the world. We can then model the count (or probability) of the number of movie tweets as a function of space and/or time to understand how information about the movie disseminates through the world. Knowing where and when potential movie tweets are occurring could be quite useful for marketing purposes (e.g., for location-specific ad campaigns) and for forecasting revenues in different regions of the world. 2.2. Smoothing Spline ANOVA Models To model the functional relationship between the tweets and space/time, we can use a Smoothing Spline ANOVA (SSANOVA; Gu, 2013; Helwig, 2013; Helwig and Ma, 2015; Wahba, 1990), which is a nonparametric statistical method for modeling functional relationships in noisy data. The SSANOVA approach can be considered a flexible extension of the multiple regression model, where the parametric (linear) effects are replaced by nonparametric (nonlinear) effects. If we let yi denote the intensity of tweets at each spatiotemporal location (e.g. estimated as the log of the tweet count in each bin), the SSANOVA model can be written as yi = η(xi ) + i
(1)
where xi = (xi1 , . . . , xip ) contains the covariates (tweet GPS coordinates and time stamp), iid
η is the unknown smooth function relating the response and covariates, and i ∼ N(0, σ 2 ) is Gaussian measurement error. See Section 4 for relaxing the Gaussian assumption. Typically, η is estimated by minimizing the penalized least-squares functional n
1X (yi − η(xi ))2 + λJ(η) n i=1
(2)
where J is a nonnegative penalty functional quantifying the roughness of η, and λ ∈ (0, ∞) is a smoothing parameter that balances the trade-off between fitting and smoothing the data. Note that as λ → 0, the bias (variance) of the estimate ηˆλ decreases (increases), and as λ → ∞ the bias (variance) of the estimate ηˆλ increases (decreases). When analyzing noisy data, setting λ too small will capture irrelevant noise, whereas setting λ too large will introduce too much bias to the estimated spatiotemporal trend. Consequently, to obtain an 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
estimate ηˆλ with an optimal mean squared error (MSE = bias2 + variance), it is necessary to find a reasonable balance between fitting and smoothing the data. The function η is estimated in a tensor-product reproducing kernel Hilbert space (RKHS) H. Given fixed smoothing parameters and a set of randomly selected knots {˘ xh }qh=1 ⊂
{xi }ni=1 , the ηλ minimizing Equation (2) can be approximated using ηλ (x) =
m X
dv φv (x) +
v=1
q X
˘h) ch ρc (x, x
(3)
h=1
where {φv }m v=1 span the null space, ρc is the reproducing kernel (RK) of the contrast space, and d = {dv } and c = {ch } are the unknown function coefficient vectors (see Helwig and Ma, 2015; Kim and Gu, 2004; Gu and Wahba, 1991; Ma et al., 2015). By definition P ρc = sk=1 θk ρ∗k , where ρ∗k denotes the RK of Hk∗ (a subspace of H), and θ = (θ1 , . . . , θs )0
are nonnegative smoothing parameters. The smoothing parameters are typically selected by minimizing Craven and Wahba’s (1979) generalized cross-validation (GCV) score. The ˆ and θ ˆ that minimize the GCV score have desirable asymptotic properties (see estimates λ Helwig and Ma, 2015; Kim and Gu, 2004; Gu and Wahba, 1991; Li, 1987), so we use the ˆ and cˆ, the fitted values have the form GCV score throughout this paper. Given estimates d ˆ + Jθ cˆ = Sθ y ˆ = Kd η
(4)
˘ h )}n×q are null and contrast space basis funcwhere K = {φv (xi )}n×m and Jθ = {ρc (xi , x tion matrices (respectively), and Sθ is the smoothing matrix, which is the nonparametric extension of the “hat” matrix from the classic multiple regression model. 2.3. Spatiotemporal Correlation and Bayesian Inference Although it may not be obvious from Equation (1), the SSANOVA model can be considered a data-driven approach for modeling spatiotemporal correlation. Linear mixed effects (LME) regression models are a popular approach for modeling correlated data, and there is a correspondence between smoothing spline models and LME models (see Gu and Ma,
5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
2005b; Wang, 1998a,b; Zhang et al., 1998). When using Kim and Gu’s (2004) selected knots approximation, the classic correspondence needs to be modified as follows. Consider an LME model of the form y = Kd + Jθ c + where Kd are the fixed effects, c ∼ N(0, σ 2 Q†θ /nλ) are the Gaussian random effects with ˘ h )}q×q and (·)† denoting the pseudoinverse (Moore, 1920; Penrose, 1950), and Qθ = {ρc (˘ xi , x
∼ N(0, σ 2 I) is Gaussian measurement error. Fixing the smoothing parameters λ and θ, the solution to the mixed model equations K0 K K 0 Jθ d K0 y = J0θ K J0θ Jθ + nλQθ c J0θ y ˆ and cˆ estimates as the SSANOVA model. Consequently, the SSANOVA results in the same d solution can be considered an LME model where the spatiotemporal correlation is estimated by tuning the smoothing parameters, which control the influence of the spatial and temporal marginal RKs on the solution. The argument given above can be slightly adjusted to arrive at the Bayesian interpretation of the SSANOVA model (see Kim and Gu, 2004; Kimeldorf and Wahba, 1970). Using this interpretation, it is possible to assess the precision of the estimate ηˆi using Wahba’s (1983) “Bayesian confidence intervals”: 1/2 ∗ ηˆi ± Z1−α/2 Vˆi(η|y)
(5)
∗ where α is the confidence level, Z1−α/2 is the standard normal quantile with α/2 in the
upper tail, and Vˆi(η|y) is the estimated posterior variance of ηˆi given the data y; note that Vˆi(η|y) = σ ˆ 2 sii(θ) for the observed data points, where sii is the i-th diagonal of Sθ and σ ˆ 2 is the estimated error variance. Assuming the smoothing parameters have been selected using the GCV score, intervals formed according to Equation (5) have a desirable “across-thefunction” coverage property (see Gu and Wahba, 1993; Wahba, 1983; Nychka, 1988), e.g., the 95% Bayesian CI can be expected to contain about 95% of the true η(xi ) values.
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
2.4. SSANOVA Kernels for Spatiotemporal Data After binning the social media data into spatiotemporal regions (see Appendix), we use the model in Equation (1) to model the log of the number of “points of interest” falling within each spatiotemporal bin. A point of interest could be all tweets with a particular hashtag, all tweets containing a particular word or word(s), all tweets from a particular user, or all tweets meeting some other criteria. The definition of a point of interest will be application-specific, but the SSANOVA approach is general. For spatiotemporal smoothing, we propose using a cubic thin-plate spline for the spatial effect and a cubic smoothing spline (either unconstrained or periodic) for the temporal effect (see Helwig and Ma, 2015). The thin-plate spline (TPS) is an isotropic multivariate smoother capable of smoothing spatial data (see Gu, 2013; Wahba, 1990; Wood, 2003, 2006). For two-dimensional (longitude, latitude) data, the cubic TPS penalty functional has the form J1 (η) =
Z X
∂ 2 η(x) ∂x21
2
+
∂ 2 η(x) ∂x22
2
+2
∂ 2 η(x) ∂x1 x2
2
dx
(6)
where x = (x1 , x2 ) contains the two-dimensional predictor (longitude and latitude), and x ∈ X ⊂ R2 is the predictor domain. Consequently, the cubic TPS defines spatial smoothness isotropically across longitude and latitude values, and is invariant to the scale of the spatial data. As a result, a cubic TPS retains spatial interrelations in the data, which is necessary for understanding spatiotemporal patterns in social media data. See Gu (2013), Wahba (1990), and Wood (2003) for more information and examples revealing the power of TPS models for smoothing spatial data. The cubic smoothing spline (SS) is a univariate smoother capable of smoothing temporal data (see Gu, 2013; Wahba, 1990). The cubic SS penalty functional has the form J2 (η) =
Z
1
[¨ η (t)]2 dt
(7)
0
where t ∈ [0, 1] is the time stamp transformed to the interval [0,1], and η¨ denotes the second derivative of η with respect to t. Taken together, a tensor product spline formed with a 7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
cubic TPS marginal (for bidimensional space effect) and a cubic SS (for time effect) offers a flexible and powerful nonparametric framework that is capable of quantifying smoothness in spatiotemporal social media patterns. See Helwig and Ma (2015) for further information on the formation of the RKs for the cubic TPS and SS, as well as an application of SSANOVA for spatiotemporal smoothing of oceanographic data. 2.5. SSANOVA Model Building When fitting the spatiotemporal model, there are two models that could be considered: additive : η = η0 + ηs + ηt
(8)
interaction : η = η0 + ηs + ηt + ηst where η0 is a constant function, ηs and ηt denote the main effect functions for space and time (respectively) and ηst denotes the interaction effect function. The additive model assumes that the spatiotemporal trend can be completely explained by the marginal spatial and temporal trends; stated differently, the additive model assumes that the temporal social media trend is homogeneous across the different spatial locations. In contrast, the interaction model allows each location to have a unique temporal trend, which is determined by the interaction effect function ηst . The choice between the additive and interaction model will depend on the particular application. For example, if the goal is to obtain the daily (or weekly or monthly or yearly) social media trend for a homogenous spatial region, the additive model may be more useful because it offers a parsimonious representation of the spatiotemporal trend. In contrast, if the goal is to examine the spatial heterogeneity in the temporal social media pattern, it would be useful to compare both the additive and interaction models. The additive model is a constrained (nested) version of the interaction model, so the SSANOVA approach makes it possible to examine the significance of the space-time interaction, i.e., ηst . If the inclusion of the interaction effect ηst produces a large improvement in the model’s fit, this is evidence that the spatiotemporal social media trend can not be well explained by the marginal spatial and temporal trends, i.e., there is spatial heterogeneity in the temporal trend. 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Note that spatial heterogeneity in the temporal social media pattern could be due to a variety of causes. For example, if there is an “event” that occurs in one spatial region (but not other regions), we may expect to see differences in the temporal social media patterns for different spatial regions. As another example, when analyzing data from multiple time zones, e.g., across the four time zones in the continental United States, we may expect to see differences in the temporal social media pattern due to a different temporal offset for each spatial region (time zone). Consequently, by comparing additive and interaction SSANOVA models, it is possible to asses the degree to which local spatial events and time zone differences affect the homogeneity of spatiotemporal patterns in social media data. 3. Twitter Example 3.1. Data To demonstrate the potential of our approach, we focus on analyzing the daily spatiotemporal Twitter trend in the United States of America (USA); note that the USA is the country with the largest number of Twitter users (Lipman, 2014; Twitter, 2015). Our sample of data contains a total of 10,005,301 tweets (with available GPS information) from the USA. This sample of data was collected over the course of a typical work week (Monday–Friday) in January. The tweet spatial distribution is well-representative of the country (Figure 1, top); it is interesting to note that the tweet locations highlight the major cities throughout the USA. Furthermore, from the tweet time stamps (see Figure 1, bottom), it is evident that there is a periodic trend in the Twitter activity; in particular, there is a substantial drop in Twitter activity from approximately midnight (00:00 CST) to mid-morning (10:00 CST). We focus on analyzing these general spatiotemporal trends using the SSANOVA approach outlined in the previous section. Finally, give that Friday seems to have a distinct temporal trend (particularly in the evening), we restrict our primary analyses to the 8,102,527 tweets collected between Monday and Thursday. The possibility of a distinct temporal trend on Friday is further examined in Section 4.
9
−100
8
log(# tweets)
10
50
6
40
−80
−120
Longitude
−100
4 0
20
2
Latitude
30
10 8 6 4 2 0
20
0 −120
150*75 bins
log(# tweets)
40
Latitude
8 6 4 2
30
10
50
100*50 bins
log(# tweets)
40 30 20
Latitude
50
50*25 bins
−80
−120
Longitude
−100
−80
Longitude
8.5 7.5
log(# tweets)
9.5
5 minute bins
6.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Mon
Tues
Wed
Thur
Fri
Time (24 hour, CST)
Figure 1: Top: two-dimensional histograms (log scale) using various numbers of bins. Bottom: line plot binning tweets every five minutes. Maps were created using maps R package (Becker et al., 2013).
3.2. Time Zone Assignments The sample of tweets spans the four major time zones (TZs) of the continental USA: Pacific, Mountain, Central, and Eastern, see Figure 2a. However, the Twitter time stamps are recorded in Central Standard Time (CST) for all data. As mentioned in Section 2.5, the TZ offsets can be considered a source of heterogeneity in the spatiotemporal Twitter trend. Consequently, ignoring the TZ effect could exaggerate the influence of the interaction effect function ηst . To examine the amount of heterogeneity due to TZ differences, it is first necessary to map all of the tweets to a TZ according to the longitude and latitude coordinates. For TZ assignments, we use Eric Muller’s TZ shape files (freely obtainable from http://efele.net/maps/tz/), which are plotted in Figure 2a. Each tweet was assigned to a particular TZ using an angle summation algorithm (see Hormann and Agathos, 2001); more efficient point-in-polygon algorithms are available, but we have found that the angle summation approach produces robust TZ assignments for our Twitter data, see Figure 2b.
10
−110
−100
−90
−80
40
45 −120
35
Latitude
40 35
PST: −2 MNT: −1 CST: +0 EST: +1
●●●●● ● ● ● ●● ● ● ●●●● ● ●●●● ●● ● ● ● ●●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●●● ●●●● ● ●● ●● ●● ●●●●●●●● ●● ● ● ●● ●●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●●●● ● ●● ●● ●●●●● ●● ● ●● ●●● ● ● ● ●●●●● ● ● ● ● ●● ●● ● ● ●● ●●● ● ●●● ● ● ●●●●●● ●●●● ● ● ●●● ● ●●●● ● ● ● ● ●● ●● ● ●●● ●● ● ●● ● ● ● ●●●●●●●● ●●●●●●●●● ● ● ●●●●●●● ●● ●● ● ●● ● ●●●●●●● ●●●● ● ● ● ●● ● ●● ●● ● ●●● ● ● ● ●● ●●● ●●●●●●●● ●●●●● ● ●●● ● ●● ●●●●●● ●●● ● ● ● ● ●● ●●●●●●●● ●●● ● ● ●●●●●● ●● ● ●● ● ● ● ●●● ●●●●●●●● ● ●● ●● ●●●●●●● ●●●●●●●● ● ●● ● ● ● ● ● ● ●●●●●●●●● ●●● ●● ●●●● ●● ● ● ● ● ● ● ●●●●● ● ●●●● ● ●●● ●●●●●●●●●●● ●●● ●●●● ●● ●●●● ● ●●● ● ●●●●●● ●●● ● ● ● ● ●●●●●●● ● ●● ●● ● ●● ● ● ● ●●● ●●●●●●●● ● ●● ● ● ● ●●● ● ●●● ●● ●● ●● ●●●●● ●●●●●●●●●● ●●● ● ●●● ● ● ● ● ● ●● ● ●●● ● ●●● ●●● ● ●●●● ●● ● ● ● ● ●●●●●●●●●●●●●●●● ● ●●●●● ● ● ●●●●●●●● ●●●● ●●● ● ● ● ●●●● ● ● ● ●● ●● ● ● ● ●●●●●● ●●● ●● ●● ●●●●● ●●●● ●● ● ●●●●●●●●● ● ● ●●●●●●●●●●●●● ●●●● ● ●●●●● ●●● ●● ● ●● ●● ● ●● ●● ●● ●● ●● ● ● ● ● ●●●● ●●●● ●●●● ●●●●●●● ● ●● ●●● ●●● ●●●●●●●●●●●●● ● ●●●●●●●●●●●●● ●●● ●●●● ●● ● ●● ● ● ● ● ●●● ● ● ●●●●●●●● ●●●●●●●●● ●●● ●●●●●● ●● ● ● ●●●●●●● ● ● ● ●●●●●●●●●●●● ●●●●● ● ●● ● ● ● ● ●● ● ●● ●● ●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ● ●●● ●● ●● ● ●● ● ● ●●●● ● ●●● ● ●●●● ●●● ● ●● ● ● ●● ●●● ● ● ● ●● ●●●●●●●●●● ● ●●●●●●●●●●●●●●● ●● ● ● ● ● ● ● ●●●● ●●●●●●●●● ●●● ●●●●● ● ●● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ●● ●●●● ● ● ● ● ●● ●●●●● ●●●● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ●●● ●●●●●●●●●●●●●●●● ●● ● ● ● ●● ● ● ●●●●●●●●●● ●● ●●●●● ● ●●●●●●●● ● ●● ●●●●●●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●●●● ●●●●●●●●●●●●●● ● ●●●● ●●● ● ● ● ●● ●●● ● ●● ●●●● ●● ●● ●●●● ● ● ● ●●● ●●● ● ●● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●●● ●●●●●●●●●●●● ●●●● ● ● ●●●●● ●● ●● ●●● ●● ●● ●●●●● ●●● ● ●●● ● ● ●●●●● ●●● ● ●●●● ● ● ● ● ●● ● ●●● ● ● ●●●● ●● ●● ● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ● ●●●●● ●●●● ●● ●●● ● ● ●● ●●●●●● ●●●● ●● ●● ●●●●●● ● ● ● ● ● ●●●●●●●●●● ●●●●●●●●● ● ● ●● ●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ● ●●● ● ● ●●●●●●● ●● ● ● ●● ● ●●●●●●●●●●●●●● ●● ● ● ● ● ●●●●● ●● ●●●●● ● ● ●●●● ● ● ●● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●● ●● ● ●●● ● ● ●●● ● ●●●●●●●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●●●●●●●● ●● ●●● ●●●●●●●●●●●●●●●●●●●● ●● ● ●● ●●● ●● ●● ●●●●●● ●● ●● ● ●●●● ● ●●●●●●●●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ● ● ●●● ●● ●●●●●● ●● ● ● ●●● ● ● ●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●● ●●●●●● ● ●●●●●●●●●●● ● ●● ● ●● ●● ●● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ●● ● ● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●● ● ● ●●●● ●● ● ●●● ●● ● ● ● ● ●● ● ● ●●●● ● ● ●● ● ● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●● ●●●●●●●● ●●● ● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●● ●●●●●●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●●●● ●● ●●● ● ● ●●● ● ●● ●●●●●●●● ●●●●●●●●●●●●●● ●●●● ●●●●●●●● ●●●●●●●●● ●●●●●●● ●●●● ● ● ●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●● ● ●●●●●● ●● ●●● ●●● ●● ●● ● ● ●● ●●●● ● ●●●●●● ● ●●● ● ● ●●● ● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●● ●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●● ● ●●● ● ● ● ● ●● ● ● ● ●●●● ●●●●● ● ● ● ●●● ● ●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●● ●●●●●● ●● ●● ●●● ● ● ● ● ●●● ● ● ● ● ● ● ●●●● ●● ● ● ● ●● ● ●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●● ● ●●● ●●●●●●●● ●●●●● ● ● ●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●●●●●● ●●●●● ● ● ●●● ●● ● ● ● ● ● ●● ● ●●●●●● ●● ● ●●●●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●● ●●●● ●●●●● ● ●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●● ● ● ● ● ● ●●●● ● ● ●●● ●●●●● ●● ● ● ●● ● ●●● ● ● ●●● ●●●●●● ●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●● ●●● ● ●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ● ● ● ●●●●● ● ● ● ●● ●●●● ● ●● ● ● ● ●● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●● ●● ● ●●●●●●●●●●●●●●●●●●●●● ●● ●● ●● ● ● ● ● ● ●●●●● ●●● ● ●● ● ●●● ● ● ●● ● ●● ● ●● ● ●● ●●●●● ●●●●●● ●●●●●●●●●●●●●●●● ● ●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●● ● ●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ● ● ●● ●● ●● ●● ● ● ● ●● ●●●● ● ●●● ●● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●● ● ● ●●●●●●●● ●●● ● ● ●●● ● ●●●● ● ●●● ● ● ●● ●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●● ●● ●●● ●●●● ● ●● ●●● ●●●●●● ● ●● ● ● ●● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ● ●●● ● ● ● ●●●● ● ●● ●● ●● ● ● ● ●● ●● ● ● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●● ●●●●● ●●● ●● ●● ● ● ●●● ● ● ● ●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●● ●●● ● ● ● ●●●● ● ●● ● ● ●●●● ● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●● ●●●● ● ●● ●● ●●● ● ● ●●●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ● ●● ● ●●●● ●● ● ● ●●●● ● ●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ●● ● ● ● ● ●●●● ●●●●● ● ● ●●● ● ●●●●● ●● ●●●● ● ● ●● ● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●● ● ●●●●● ● ●●●●●● ● ●●●●●●● ● ●●●● ●●●●●●●●●● ●● ●●●●●● ●●●●●●●●●●● ●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●● ● ●●● ●●● ●● ●●●●● ● ● ● ●● ●●●● ● ● ●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ● ● ● ●● ●●●●●●●●●●●●●●●●●● ● ● ● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ● ●●● ●● ●● ● ● ● ●●●●●●●● ● ● ● ●●●● ●●● ● ●● ● ●●●●●●●● ●● ●●●●●●●● ●●●●●●● ●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●● ●●● ● ● ●●●●●●●● ● ● ●●●● ●●●●●● ●●●●●● ●● ● ●● ● ●●●● ●●●● ● ●●●●●●●●● ● ●● ● ●●●● ●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●● ●●●● ●● ● ●●●●●●● ● ●●●● ●● ●●●●●●●●●●●●● ●●●●●● ●● ●● ●●●● ●●● ●● ● ●●●●●● ● ●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●●● ● ●● ●● ●● ● ●● ● ● ● ● ● ●●●●●●●●● ● ●●● ● ● ● ● ●●● ●● ●● ● ●●●●●● ●●● ● ●● ● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●● ●● ●● ●●● ● ● ●●●●●●●●●●● ● ●● ●● ● ●●●●●●●●●●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●● ●● ● ●● ● ● ● ● ● ●●●●●●●●● ● ●●●● ● ●● ●● ●●●●●●●●● ●●● ●●●● ●●●●●●●●●●●●●●● ●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●● ● ●●●●●●●●● ●●● ● ●●●●●●●●●●● ●●●●●●●● ● ● ● ●●●● ● ●●● ● ●●●●●●●●●●●●●●●● ●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●● ●● ● ● ●● ●● ● ● ●● ● ● ●●● ●● ● ●●●●●●●●●● ●● ●●● ● ●●●●●● ●●●● ●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●● ●● ● ● ●● ●● ● ●● ●●● ●●●● ●●●●● ●●●●●●●●●●●●●●●●● ● ● ● ● ● ●●●●●● ●●●●●● ●●●●●●●●●●●● ●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●●● ● ●●● ●● ● ● ●●●●● ●● ● ● ●●●●●●● ● ●●●●● ● ●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●● ●●●●●●● ●●● ● ● ● ● ● ● ● ●●●●● ●●● ● ● ●●●●●●●●●● ● ● ● ●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ●●●●●●● ●●●●● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ●●●●● ● ● ● ● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●● ●● ●● ● ● ●● ●● ●●● ●●●● ● ●● ●● ●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●●● ● ● ● ● ● ●●●●● ●●●● ●● ● ● ●●● ●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●● ●● ● ●● ● ●● ●●●● ● ● ●●●●● ● ● ● ●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ● ●● ● ●● ● ● ● ● ● ●●● ●●● ●●● ●●●●● ● ●●●●●●●● ●● ●● ●●●● ●● ●●● ●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●● ●●● ● ● ● ● ● ● ● ●● ● ● ●●●●● ●●●● ● ●● ●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●● ● ●● ●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●● ● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●● ● ●●● ●●● ●●● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ●●●● ●●●●●●●●●●●●●●● ●●●●●●●● ●●●●● ●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●●●●●●●●●●● ● ● ●●● ●● ●● ● ● ● ●●● ● ●● ●●● ● ●●● ● ● ● ●●●●●● ● ●●●●●●●●● ●●●●● ● ●●●●● ●●●● ●●●●●●●●● ●● ● ● ● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●● ●●●●● ● ●● ●●● ● ●● ● ● ● ●● ●● ● ● ●●●● ● ● ● ●●●●● ●● ● ● ●● ●●●● ●● ●●●●●● ● ●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●● ●●●●●● ●●●● ● ●● ● ● ●●●● ● ● ●●●● ●● ●●●●●● ● ● ●●● ● ● ● ●●●●● ● ●● ●●●●●●●●●●●● ● ●● ●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●● ● ● ●● ● ● ●●● ●● ●●● ●●●● ● ●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●● ●●●●●●●● ●● ●●● ●● ●● ● ●● ●●● ● ● ● ●● ●●● ● ●●● ● ●●● ●●● ● ●●●● ●●●●●●●●●●● ●● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●● ● ● ● ●●●● ● ●● ●●●●● ●● ●●● ● ● ● ●●● ●● ●●● ● ●●● ●●●●●● ● ●●● ●●●●●●●●●●●●● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●● ● ●● ● ●●● ●●●● ● ●●●● ● ●● ● ●● ● ●● ● ● ●● ● ● ●● ● ●●●●● ●● ● ●●●●●●●●●●●● ●●●●●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●● ● ●● ●●●●●●● ● ●●● ● ●●●●●●● ● ● ●● ● ● ● ● ● ●●●●●● ●●● ●●● ●● ●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●● ● ● ● ● ●●●●●●● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ●● ●● ●●● ●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●● ●●●●●●● ●● ●● ● ● ●●● ●●●●● ● ● ● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●●● ●●●● ● ●●● ● ● ●●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ●●●● ● ● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●● ●●●● ● ●● ● ● ●●●● ● ● ● ● ●● ● ● ● ●●●● ● ● ●● ●●●●●●●●●●●●● ●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●● ●● ●●●● ● ● ● ● ● ● ● ●● ● ●●●●●●●● ● ● ● ●● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●● ● ●●●● ●● ●● ● ●● ● ●● ●●● ●●●●●● ● ●●●●●●● ●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●● ●●● ● ●● ●●●● ●●●●●●●●●● ●●● ● ●●● ● ●● ● ●●● ● ● ●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●● ● ●●●● ● ●● ●●●● ●● ●●●●●● ●●●● ● ● ●●●●●●●●●● ● ● ●●●●●● ●●●●●●● ● ● ●●● ●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ● ●●●●●●●●●●●● ● ●●● ● ●●● ●●●●●●● ● ●●●●●●●●●●●●●●●●● ● ● ●●● ●● ●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●●●●●● ●●●●●●● ● ● ●●●●● ●● ●●●● ● ● ● ●● ● ● ●● ●● ●● ●● ●●●●●●●●●●●● ●●●●● ●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ●●●●●● ●●●● ●● ●● ● ●●●● ●● ●●●●● ● ● ● ● ●●● ●● ● ●● ● ●● ●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●● ● ● ● ● ●● ● ●●●● ● ●● ●● ●● ● ● ●● ●●● ● ● ●● ●●● ●●●●● ●●●● ●●●● ●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●● ● ●● ●●● ● ● ●● ●● ● ● ●●●● ● ● ● ●●●● ●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●● ●●●● ● ●●● ● ●● ● ● ● ●● ●●●●●● ● ●●●● ●●●●● ●●●●●●●● ● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●● ● ●●● ●●●● ● ● ● ● ●●● ● ●● ●●● ●●●●● ●● ●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ● ● ●● ●●●●● ●● ● ● ● ● ● ●●●●● ● ●●●●●●● ●●●●● ●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●● ● ●● ●● ● ● ●●●●●●●● ● ●●●●● ●●●●●●●●● ●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●● ● ●●●●●●●●●●●●● ●●● ●● ● ● ● ●●●●● ● ●●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●● ● ●●●●●●●●●● ●●●● ● ● ● ●●●●●●●●●●●● ●●●●● ●● ●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ● ● ●●●●●●● ●● ● ● ● ●●●● ●● ●●● ●● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●● ●●●●●●●● ●● ●● ●● ●● ● ● ● ●● ● ●●●● ●●●● ● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●● ●●●● ● ●●●●● ●●●●● ● ●●●● ● ●● ● ●●● ●● ●●●●●●● ● ●● ●● ●●●●● ● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●● ●●●●●● ●●●●●●●●●● ●● ●●●● ●● ●●●●● ●●●●● ●● ●●● ●●●●●●●●●●● ● ● ●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●● ●●●●●● ●●● ● ●●● ● ● ● ● ● ● ●●●● ● ●●● ●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●●● ●● ●● ● ●●● ●● ● ● ●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●● ●●●● ● ●● ●●●●●●●●●●●●●●● ●● ● ●●●●● ●●●●●●●●●● ● ●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●● ●●●● ●●●● ● ● ●●●●●●●●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●● ●●●●●●● ● ●●●●● ● ●●●●●●● ● ●●●● ●●● ● ●●●● ●●● ●●● ●●●●●●●●●●●●●●●●● ●● ●●● ●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ● ●●● ● ● ●●●●● ●● ●●●●● ●● ●● ●●● ●●●●● ●●●●●●●●●●●●●●●● ●● ●●●● ●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●● ● ● ●● ● ●● ●● ●●●●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●●●●●●●● ●● ●● ● ●● ●●● ●● ● ●● ●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●● ●● ● ● ●● ●●●●● ● ● ●● ●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●● ● ●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●● ● ● ●● ●●●●● ● ● ● ●● ●●●●●●●●●●●●●●● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●● ●● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ● ●●●●● ●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●● ●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●● ● ●●●●● ●●●●●●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●● ●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●●●● ● ●●● ●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●● ● ●● ● ●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●● ●● ● ●●●●●●●●●● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●● ●● ●● ●●●●●●●●●●●● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ●●●●●●●●●●●● ● ●● ● ●●●●●●●●●●●●●●●● ●●●●●●●●● ●●● ● ●●● ●●●●●●●●●●●● ● ● ●●●●●●●●● ●●● ●●●●●●● ● ● ● ●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●● ● ●●● ●● ● ●●●●●●●● ●●●●●●●●●● ●● ●●●● ● ●●●●●●●●●●● ●●●●●●●●●●● ●●●● ●●● ●●●●●●●● ●●●● ●●●●●●● ● ● ● ●●●●●●● ●●●●●●●●●● ● ●●●●● ●●●●●●●●● ●●●●●●●●●●●●● ● ●● ●●●●●●●●● ●●●●●●●●●● ●● ●●● ●●●●●●● ●●●●●●●●●●●●● ●● ● ●● ● ●●●●●●● ●●●● ● ●● ●●●●● ●●●●●●●●●●●●● ● ●● ● ● ●●●●●●●●●●●●● ● ● ●●●● ●●● ●●●● ● ●● ●●● ●● ●●●● ●● ● ● ●●●●●●●●●●●●● ● ●●● ● ●●●● ●●●●●● ●● ●●● ●●●● ● ●● ●●●●●●●● ●●● ● ●●●● ●●●●●●●● ●●●●●●●●●● ●●●● ●●●● ●●● ● ●●●●●● ●●● ●● ●●
30
● ● ●
25
30
Latitude
45
50
b) Time Zone Assignments
50
a) Time Zone Borders
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
−70
●
−120
Longitude
−110
−100
−90
−80
PST: −2 MNT: −1 CST: +0 EST: +1
−70
Longitude
Figure 2: a) Time zone borders defined by Eric Muller’s shape files, along with the Daylight Savings Time offset relative to Central Standard Time. b) Visualization of tweet time zone assignments obtained via the angle summation point-in-polygon algorithm.
3.3. Analyses To demonstrate the power of the proposed SSANOVA model for spatiotemporal smoothing of social media data, we begin by binning the data according to both space and time (see Appendix). To make the spatiotemporal smoothing results (somewhat) comparable with Figure 1, we use 100 bins for the longitude values, 50 bins for the latitude values, and the temporal bin size is set at two hours with bin midpoints at {0:00, 2:00, . . . , 22:00, 24:00}.1 We chose these particular bin sizes for our analyses because Figure 1 reveals that (a) 100*50 spatial bins provides a reasonable spatial precision, and (b) the temporal trend changes rather smoothly throughout the day. The effect of the bin size on the SSANOVA solution is examined in Section 4. Using these spatiotemporal bin sizes resulted in n = 27,878 data points (non-empty bins) when using the time stamp with all data measured in CST. When analyzing the data in Local Standard Time (LST), i.e., CST plus the TZ offset from Figure 2, these spatiotemporal bin sizes resulted in n = 27,847 data points (non-empty bins). The response yi is defined as the log of the number of observations in the i-th bin, and xi contains the bin centers, i.e., the spatial (longitude and latitude) and temporal (hour) coordinates corresponding to the midpoint of the spatiotemporal bin. We use q = 2n2/3 knots that are randomly sampled from the covariate domain, which should be enough knots 1
Defining the temporal bins in this fashion, mj = 13 and bj = (−1, 1, 3, 5, . . . , 23, 25), see Appendix.
11
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
to capture spatiotemporal functions where J(η) < ∞ is barely satisfied (see Gu, 2013; Gu and Kim, 2002; Kim and Gu, 2004). Given that we are analyzing the daily spatiotemporal trend for a typical week of Twitter data, we used a cubic TPS for the marginal spatial effect and a periodic cubic SS for the marginal time effect. For both the CST and the LST data, we tried fitting the SSANOVA model with and without the interaction effect, see Equation (8). Note that comparing the additive and interaction models (separately for CST and LST data) makes it possible to assess the amount of spatial heterogeneity in the temporal Twitter trend that is due to the TZ offset. The SSANOVA models are fit using the bigsplines R package (Helwig, 2015), which fits smoothing spline regression models using scalable algorithms designed for large samples. In particular, we use the bigssa function, which implements the scalable algorithm and efficient SSANOVA approximation described in Helwig and Ma (2015). We initialize the smoothing parameters using the smart starting algorithm (see Helwig and Ma, 2015), which is inspired by Algorithm 3.2 of Gu and Wahba (1991). After the initialization, we use the fully iterative GCV tuning (skip.iter=FALSE option) to estimate the smoothing parameters. It is possible to obtain a faster solution using the partial GCV tuning (skip.iter=TRUE), which fixes the smoothing parameters after the smart starting algorithm. However, for tensor product splines with cubic TPS and cubic SS marginals, we have found that the fully iterative GCV tuning provides a noticeable improvement when fitting the interaction model to this sample of social media data. We also compare the SSANOVA solution with other data-driven approaches for modeling the Twitter data. In particular, we compare the SSANOVA results with regression trees (implemented in the rpart R package, Therneau et al., 2015) and with support vector regression (implemented in the e1071 R package, Meyer et al., 2014). For the regression trees, we set the complexity parameter small (cp=1e-9) to examine the performance across a large number of partitions; the remaining controls parameters were set at the default values (see Therneau et al., 2015). For the SV regression, we used a 10-fold cross-validation procedure to tune the parameter, searching a grid of ten values ∈ {0, 0.1, . . . , 0.9, 1}; the remaining
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Table 1: Fit statistics for the smoothing spline ANOVA models with Friday excluded.
TZ Model CST add CST int LST add LST int Note. AIC/BIC =
GCV R2 AIC BIC 1.094 0.782 81564.09 92097.07 1.052 0.797 80428.38 94394.93 1.040 0.793 80066.68 90648.61 1.056 0.797 80424.61 94810.51 Akaike/Bayesian information criterion.
control parameters were set at the default values (see Meyer et al., 2014). Finally, note that the regression tree does not require the specification of the model structure (i.e., additive versus interaction effects), whereas the SV regression model requires such a specification; consequently, we tried fitting both the additive and interaction models for SV regression. 3.4. Results In Table 1 we plot the fit statistics for the four different SSANOVA models that were fit to the Twitter data. Examining the fit statistics for the CST data, we see that the additive and interaction models explain 78.2% and 79.7% of the data variation, respectively. Also, for the CST data, note that the AIC (Akaike, 1974) chooses the interaction model, whereas the BIC (Schwarz, 1978) chooses the more parsimonious additive model. When analyzing the data with all time stamps in CST, the interaction effect ηst captures the TZ offset, as well as all other sources of spatial heterogeneity in the temporal Twitter trend. Consequently, comparing the additive and interaction model R2 values for the CST data reveals that including the TZ effect in the model can (at most) account for an additional 1.5% explained variation. Examining the fit statistics for the LST data makes it possible to determine how much of the spatial heterogeneity in the temporal Twitter trend is due to the TZ offset. For the LST data, we see that the additive and interaction models explain 79.3% and 79.7% of the data variation, respectively. This reveals that (a) the TZ offset in the periodic temporal Twitter trend accounts for about 1% of the data variation, and (b) all other sources of spatial heterogeneity in the temporal Twitter trend account for about 0.4% of the data variation. Also, for the LST data, note that both the AIC and BIC choose the more parsimonious 13
ηt: Temporal Effect
−110
−100
−90
−80
−70
Longitude
0.0
log(# tweets)
−2 −4 −120
−1.5 −1.0 −0.5
0
40 35 30
Latitude
2
log(# tweets)
45
0.5
4
50
ηs: Spatial Effect
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
0
5
10
15
20
Time (24h, LST)
Figure 3: Spatial and temporal effect functions for additive SSANOVA model fit to LST data.
additive model. Consequently, the model fit statistics reveal that the Twitter data can be well-explained by the additive SSANOVA model when analyzing the data in LST. Figure 3 plots the estimated spatial and temporal effects functions (i.e., ηˆs and ηˆt ) for the additive SSANOVA model fit to the LST data. Note that the SSANOVA solution makes it possible to predict the Twitter activity anywhere throughout the observed spatiotemporal domain, so it is possible to obtain predictions at a finer level than the analyzed bins. The spatial effect plot is created using a grid of 45,000 values (300 longitude × 150 latitude), which is three times the spatial precision used to bin the data for analysis. Similarly, the temporal effect plot is created using a sequence of 100 values spanning the 24-hour time period, which is more than seven times the temporal precision used to bin the data. Lastly, the precision of the SSANOVA solution can be assessed via the Bayesian confidence intervals, see Section 2.3. The dashed lines in Figure 3 plot the 95% confidence interval for the temporal effect, and the 95% confidence interval for the spatial effect is illustrated in Figure 4. Fit statistics for the regression tree (RT) and support vector (SV) regression models are plotted in Figure 5. For the RT models (see Figure 5, left column), there is no noticeable difference between the solutions obtained from the CST versus the LST data. This is not surprising because, similar to the SSANOVA model with the interaction effect, the RT model is capable of modeling spatial heterogeneity in the temporal Twitter trend; in particular, any offset due to the TZ effect can be easily accounted for by adjusting the spatiotemporal
14
4 2
log(# tweets)
45
0
40
−2 −4 25
−4
30
−2
35
Latitude
0
40 35 30
Latitude
2
log(# tweets)
45
4
50
95% CI: Upper Bound
50
95% CI: Lower Bound
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
−120
−110
−100
−90
−80
−70
−120
Longitude
−110
−100
−90
−80
−70
Longitude
Figure 4: Lower and upper bounds of 95% Bayesian confidence interval for spatial effect.
partitions used to construct the tree. When enough partitions are included (e.g., 10001500), the RT model produces similar R2 values as the SSANOVA models, implying that the RT model is a reasonable candidate in terms of prediction purposes. However, unlike the SSANOVA approach, the RT approach does not provide a functional understanding of the Twitter data. In particular, the RT approach does not (a) attempt to decompose the Twitter data into spatial versus temporal effects, and (b) does not provide useful visualizations of the spatiotemporal properties of the Twitter data. As a result, the RT approach does not allow one to quantify the heterogeneity due to the TZ offset, or understand the periodic temporal nature of the Twitter data. For the SV regression models (see Figure 5, middle and right columns), there is a noticeable difference between the solutions obtained from the CST versus the LST data. Similar to the SSANOVA results, the SV regression results reveal that the solution improves when the data are analyzed in LST. Also, the SV regression results reveal that the additive model produces a smaller cross-validated mean squared error, whereas the interaction model produces a larger model R2 . However, this comparison should be made with caution because the hyperparameter γ used by the svm function is set according to the model dimension (see Therneau et al., 2015), which differs for the additive and interaction models. In practice, it is possible to tune all of the relevant hyperparameters for the SV regression model (i.e., , γ, and the cost C). However, this sort of tuning would require substantial amounts of
15
0
500
1000
1500
2000
4.24
4.26
4.28
CST LST
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
epsilon
Reg Tree R−squared
SV Reg R−squared (Add)
SV Reg R−squared (Int)
500
1000
1500
Size of Tree
2000
0.78
R−squared
0.76
0.618
CST LST
CST LST
0.74
0.614
R−squared 0
0.610
0.8 0.6 0.4
CST LST
0.80
epsilon
1.0
Size of Tree
0.2
R−squared
10−fold CV−MSE
2.30 2.28 2.26
10−fold CV−MSE
CST LST
SV Reg CV−MSE (Int)
2.24
0.5
0.7
0.9
CST LST
SV Reg CV−MSE (Add)
0.3
Relative CV−MSE
Reg Tree CV−MSE
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
0.0
0.2
0.4
0.6
epsilon
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
epsilon
Figure 5: Fit statistics for regression tree and support vector regression models.
computation (and lots of patience); note that the SV tuning needed to produce Figure 5 required hours of computation, whereas the RT and SSANOVA models were fit on the order of seconds and minutes, respectively. Furthermore, similar to the RT results, the SV regression results do not provide a functional understanding of the Twitter data, so we do not consider the SV regression results particularly useful for understanding the spatiotemporal properties of the daily Twitter trend. 4. Discussion 4.1. Summary of Findings Our results reveal that SSANOVA models provide a powerful framework for simultaneously understanding spatial and temporal trends in social media data. In particular, our example demonstrates that the SSANOVA predictions correspond well with the spatial histograms and temporal line plots in Figure 1. However, unlike the histograms and line plots, the SSANOVA models provide additional insight, as well as information that can be used for inference (e.g., assessing model fit, forming confidence intervals, and making predictions). In 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
particular, comparing additive versus interaction SSANOVA models can reveal the amount of spatial heterogeneity in the temporal social media trend. Also, using the Bayesian confidence intervals, the SSANOVA models make it possible to quantify the uncertainty of the estimated effects (e.g., ηˆs and ηˆt ). Consequently, the SSANOVA approach makes it possible to determine the statistical significance of the observed spatial and temporal social media trends, which should be useful for a variety of social media analysis problems. Our results also reveal that the SSANOVA approach offers practical insights above and beyond those obtainable from popular data-driven approaches such as RTs and SV regression models. Unlike these other approaches, the SSANOVA approach decomposes the data into the summation of spatial effects (ˆ ηs ), temporal effects (ˆ ηt ), and spatiotemporal effects (ˆ ηst ), which provide practical insight about the spatiotemporal properties of the Twitter data. Furthermore, by decomposing the data into a summation of effect functions, the SSANOVA approach provides powerful visualizations of the spatiotemporal trends that cannot be obtained from the other methods. Using the RT or SV models, it is only possible to obtain spatial predictions at a particular time point (or temporal predictions at a particular location), so these approaches cannot provide quantifications and visualizations of the marginal trends revealed in Figure 3. Finally, the Bayesian CIs in Figure 4 are not obtainable using the RT and SV regression approaches; note that the precision of these estimates could be obtained via bootstrapping, but this would greatly increase the computational expense. For our sample of tweets, the different SSANOVA models explained anywhere from 78– 80% of the variation in the daily Twitter pattern. Comparing the SSANOVA results for the CST and LST data revealed that the TZ offset in the periodic temporal Twitter trend accounts for about 1% of the data variation, whereas all other sources of spatial heterogeneity in the temporal Twitter trend account for about 0.4% of the data variation. When the data were analyzed in LST, the model fit statistics revealed that the additive model should be preferred, which indicates that the daily Twitter pattern can be well-explained by an additive function of the spatial trend and the periodic temporal trend; note that this periodic insight is not obtainable using the RT or SV regression approaches. Finally, the Bayesian CIs in
17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Table 2: Fit statistics for the smoothing spline ANOVA models with Friday included.
TZ Model CST add CST int LST add LST int Note. AIC/BIC =
GCV R2 AIC BIC 1.132 0.785 85153.47 95944.80 1.074 0.799 83624.11 96358.02 1.088 0.794 83909.48 94768.76 1.114 0.795 84534.95 98640.93 Akaike/Bayesian information criterion.
Figures 3–4 reveal that we have a very precise estimate of the periodic temporal trend ηˆt , whereas the spatial trend ηˆs has more variability. 4.2. Friday and Binning Effects In our primary analysis, we only modeled data from Monday through Thursday, due to the apparent deviation of the temporal Twitter trend during Friday afternoon (see Figure 1). To assess this Friday effect, we tried refitting the SSANOVA models with the Friday data included in the analyses. The fit statistics for the Friday models are given in Table 2, which reveal that including Friday has little effect on the model’s fit. Consequently, it seems that the apparent Friday effect seen in Figure 1 does not deviate from the typical weekday (Mon–Thu) trend enough to corrupt the SSANOVA estimator. Also, in our primary analysis, we used a particular choice of bin sizes (100 longitude, 50 latitude, and 13 time), which was determined by an initial visual inspection of the data (via Figure 1). To assess the stability of the SSANOVA estimator across bin sizes, we tried fitting the model with less bins (50 longitude, 25 latitude, and 9 times) and with more bins (150 longitude, 75 latitude, and 25 times).2 The spatial and temporal effect functions corresponding to the SSANOVA solution with different bin sizes are plotted in Figure 6. Note that the effect functions in Figure 6 reveal similar patterns as those in Figure 3, except the patterns now reveal less/more spatiotemporal precision in the estimate. This stability of the SSANOVA solution across different bin levels has very important implications for the practical analysis of massive samples of social media data. When binning time, note that 9 bins the data every three hours, 13 bins the data every two hours, and 25 bins the data every hour. 2
18
ηt: Less Bins
−100
−90
−80
0.0
log(# tweets)
0 −2
−1.0
−4
30 25
−110
−70
0
5
10
15
Longitude
Time (24h, LST)
ηs: More Bins
ηt: More Bins
20
−120
−110
−100
−90
−80
0.0 −1.0 −1.5
−0.5
log(# tweets)
−2 −4
0
40 35 30
Latitude
2
log(# tweets)
45
0.5
4
50
−120
−0.5
log(# tweets)
40 35
Latitude
2
45
0.5
4
50
ηs: Less Bins
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
−70
0
Longitude
5
10
15
20
Time (24h, LST)
Figure 6: Spatial and temporal effect functions for additive SSANOVA model fit to LST data with less bins (top) and more bins (bottom). When using less bins, the correlation between ηˆs from Figures 3 and 6 is 0.88, and the correlation between ηˆt from Figures 3 and 6 is 0.97. When using more bins, the corresponding correlations are 0.87 and 0.98.
19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
ˆ and cˆ coeffiOnce an SSANOVA model is fit to some training data (i.e., once the d cients are estimated), it is possible to predict the social media trend at any point throughout the observed space-time domain. As a result, it is possible to fit an SSANOVA model using a moderately small number of bins (which is computationally cheap), and then predict the spatiotemporal Twitter trend using a dense grid of points to obtain a higher-resolution visualization of the spatial trend. In practice, it would be possible to tune the bin size parameters (e.g., via k-fold cross-validation). However, in most cases, the researcher should have an idea about the spatiotemporal precision needs for the particular application. For example, if distinguishing urban from rural patterns is of primary importance, a small number of bins should suffice; in contrast, if distinguishing urban from suburban patterns (or distinguishing neighborhood patterns) is needed, more bins will be required. Essentially, the binning precision decision is comparable to the decision any researcher makes when he/she chooses the measurement precision needed for the particular problem at hand. 4.3. Non-Gaussian Extensions The estimates ηˆt , ηˆs , and ηˆst are penalized least-squares estimates, and do not require any assumptions about the data or residual distribution. However, for the Bayesian confidence iid
intervals to be valid, the assumption that i ∼ N(0, σ 2 ) is required. In some cases, the distribution of (the log of) a social media trend may be non-Gaussian regardless of the number of bins. For such cases, it is possible to extend the SSANOVA framework to model data from any exponential family distribution. For non-Gaussian data, the SSANOVA estimator minimizes the penalized log-likelihood function n
1X λ − [yi η(xi ) − b(η(xi ))] + J(η) n i=1 2
(9)
˙ ˙ where b(η(x i )) = E(yi |xi ) with b denoting the first derivative of b (see Gu and Ma, 2005a; Gu, 2013; Wahba et al., 1995; Wood, 2006). The computation for non-Gaussian models is more costly, but this flexible extension makes it possible to model a variety of different spatiotemporal trends in social media data. 20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Appendix: Spatiotemporal Binning Let X = {xij } ∈ Rn×p denote a data matrix, and suppose we want to assign each of the n rows (subjects) of X into a bin based on the p columns (variables). Let mj denote the number of marginal bins corresponding to the j-th variable, which is a user-tunable parameter. Also, let x(i)j denote the order statistics for the j-th variable, so that x(1)j = mini xij and x(n)j = maxi xij . Now define bj = {bjk } ∈ Rmj +1 to be a vector of break points for the j-th variable, such that bj1 ≤ x(1)j < · · · < x(n)j ≤ bj(mj +1) defines mj bins spanning the range of xij . After initializing g = 1n as a vector of ones and h = 1 (scalar), multidimensional bin ˜ j such as memberships can be defined by combining marginal bins for each column x Loop for j ∈ {1, . . . , p} 1. zj ← bin(˜ xj |bj ) 2. g ← g + h(zj − 1) 3. h ← hmj End ˜ j according to bj (so that zij ∈ {1, . . . , mj }). At the where bin(˜ xj |bj ) bins each element of x completion of the loop, we have gi ∈ {1, . . . , M }, where gi denotes the i-th element of g and Q M = pj=1 mj is the maximum number of multidimensional bins. Acknowledgements The project was partially supported by NSF DMS 1438957, DMS 1440037, DMS 1440038. The project was also partially supported by the NCSA/IACAT fellowship and start-up funds from the University of Minnesota. References Achrekar, H., Gandhe, A., Lazarus, R., Yu, S. H., Liu, B., 2011. Predicting flu trends using twitter data. In: Computer Communications Workshops (INFOCOM WKSHPS). 21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
pp. 702–707. Akaike, H., 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723. Asur, S., Huberman, B. A., 2010. Predicting the future with social media. Web Intelligence and Intelligent Agent Technology 1, 492–499. Becker, R. A., Wilks, A. R., Brownrigg, R., Minka, T. P., 2013. maps: Draw Geographical Maps. R package version 2.3-6. URL http://CRAN.R-project.org/package=maps Bollen, J., Mao, H., Zeng, X., 2011. Twitter mood predicts the stock market. Journal of Computational Science 2, 1–8. Cheng, T., Wicks, T., 2014. Event detection using twitter: A spatio-temporal approach. PloS one 9, e97807. Cho, E., Myers, S. A., Leskovec, J., 2011. Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 1082–1090. Corley, C., Mikler, A. R., Singh, K. P., Cook, D. J., 2009. Monitoring influenza trends through mining social media. In: BIOCOMP. pp. 340–346. Craven, P., Wahba, G., 1979. Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik 31, 377–403. Culotta, A., 2010a. Detecting influenza outbreaks by analyzing twitter messages. Culotta, A., 2010b. Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of the first workshop on social media analytics. pp. 115–122. Fan, J., Han, F., Liu, H., 2014. Challenges of big data analysis. National Science Review. 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Gu, C., 2013. Smoothing Spline ANOVA Models, 2nd Edition. Springer-Verlag, New York. Gu, C., Kim, Y.-J., 2002. Penalized likelihood regression: general formulation and efficient approximation. The Canadian Journal of Statistics 30, 619–628. Gu, C., Ma, P., 2005a. Generalized nonparametric mixed-effect models: Computation and smoothing parameter selection. Journal of Computational and Graphical Statistics 14, 485–504. Gu, C., Ma, P., 2005b. Optimal smoothing in nonparametric mixed-effect models. The Annals of Statistics 33, 1357–1379. Gu, C., Wahba, G., 1991. Minimizing GCV/GML scores with multiple smoothing parameters via the newton method. SIAM Journal on Scientific and Statistical Computing 12, 383– 398. Gu, C., Wahba, G., 1993. Smoothing spline ANOVA with component-wise bayesian “confidence intervals”. Journal of Computational and Graphical Statistics 2, 97–117. Helwig, N. E., May 2013. Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Ph.D. thesis, University of Illinois at Urbana-Champaign. Helwig, N. E., 2015. bigsplines: Smoothing Splines for Large Samples. R package version 1.0-6. URL http://CRAN.R-project.org/package=bigsplines Helwig, N. E., Ma, P., 2015. Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics 24, 00–00. Helwig, N. E., Ma, P., in press. Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface.
23
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Hormann, K., Agathos, A., 2001. The point in polygon problem for arbitrary polygons. Computational Geometry 20, 131–144. Kaplan, A. M., Haenlein, M., 2010. Users of the world unite! the challenges and opportunities of social media. Business horizons 53, 59–68. Kim, Y.-J., Gu, C., 2004. Smoothing spline gaussian regression: More scalable computation via efficient approximation. Journal of the Royal Statistical Society, Series B 66, 337–356. Kimeldorf, G., Wahba, G., 1970. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics 41, 495–502. Lampos, V., 2012. Detecting events and patterns in large-scale user generated textual streams with statistical learning methods. Lampos, V., Cristianini, N., 2010. Tracking the flu pandemic by monitoring the social web. In: IAPR Cognitive Information Processing. Lampos, V., Cristianini, N., 2012. Nowcasting events from the social web with statistical learning. ACM Transactions on Intelligent Systems and Technology 3, 72. Lee, R., Sumiya, K., 2010. Measuring geographical regularities of crowd behaviors for twitterbased geo-social event detection. In: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks. pp. 1–10. Li, K.-C., 1987. Asymptotic optimality for Cp , CL , cross-validation and generalized crossvalidation: Discrete index set. The Annals of Statistics 15, 958–975. Lipman, V., 2014. Top twitter trends: What countries are most active? who’s most popular?
http://www.forbes.com/sites/victorlipman/2014/05/24/top-twitter-trends-what-
countries-are-most-active-whos-most-popular/. Ma, P., Huang, J., Zhang, N., 2015. Efficient computation of smoothing splines via adaptive basis sampling. Biometrika 102, 00–00. 24
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., 2014. e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.6-4. URL http://CRAN.R-project.org/package=e1071 Moore, E. H., 1920. On the reciprocal of the general algebraic matrix. Bulletin of the American Mathematical Society 26, 394–395. Nychka, D., 1988. Bayesian confidence intervals for smoothing splines. Journal of the American Statistical Association 83, 1134–1143. Padmanabhan, A., Wang, S., Cao, G., Hwang, M., Zhang, Z., Gao, Y., Soltani, K., Liu, Y. Y., 2014. Flumapper: A cybergis application for interactive analysis of massive locationbased social media. Concurrency and Computation: Practice and Experience 26, 2253– 2265. Penrose, R., 1950. A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society 51, 406–413. Sadilek, A., Kautz, H., Silenzio, V., 2012. Predicting disease transmission from geo-tagged micro-blog data. In: AAAI. pp. 136–142. Schwarz, G. E., 1978. Estimating the dimension of a model. Annals of Statistics 6, 461–464. Signorini, A., Serge, A. M., Polgreen, P. M., 2011. The use of twitter to track levels of disease activity and public concern in the us during the influenza a h1n1 pandemic. PloS one 6, e19467. Therneau, T., Atkinson, B., Ripley, B., 2015. rpart: Recursive Partitioning and Regression Trees. R package version 4.1-10. URL http://CRAN.R-project.org/package=rpart Tsou, M. H., Yang, J. A., Lusher, D., Han, S., Spitzberg, B., Gawron, J. M., ..., An, L., 2013. Mapping social activities and concepts with social media (twitter) and web search
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
engines (Yahoo and Bing): a case study in 2012 us presidential election. Cartography and Geographic Information Science 40, 337–348. Twitter, 2015. Twitter usage (https://about.twitter.com/company). URL https://about.twitter.com/company Wahba, G., 1983. Bayesian “confidence intervals” for the cross-validated smoothing spline. Journal of the Royal Statistical Society, Series B 45, 133–150. Wahba, G., 1990. Spline models for observational data. Society for Industrial and Applied Mathematics, Philadelphia. Wahba, G., Wang, Y., Gu, C., Klein, R., Klein, B., 1995. Smoothing spline anova for exponential families, with application to the wisconsin epidemiological study of diabetic retinopathy. The Annals of Statistics 23, 1865–1895. Wang, S., Cao, G., Zhang, Z., Zhao, Y., Padmanabhan, A., 2012. A cybergis environment for analysis of location-based social media data. In: Hassan, A. K., Amin, H. (Eds.), Location-Based Computing and Services, 2nd Edition. CRC Press, pp. 187–205. Wang, Y., 1998a. Mixed effects smoothing spline analysis of variance. Journal of the Royal Statistical Society, Series B 60, 159–174. Wang, Y., 1998b. Smoothing spline models with correlated random errors. Journal of the American Statistical Association 93, 341–348. Wood, S. N., 2003. Thin plate regression splines. Journal of the Royal Statistical Society, Series B 65, 95–114. Wood, S. N., 2006. Generalized additive models: An introduction with R. Chapman & Hall, Boca Raton. Zhang, D., Lin, X., Raz, J., Sowers, M., 1998. Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association 93, 710–719. 26