GIS and Placemaking Using Social Media Data

8MB Sizes 4 Downloads 211 Views

Report

Full Text

2.16

GIS and Placemaking Using Social Media Data

Yan Chen, University of North Carolina-Chapel Hill, Chapel Hill, NC, United States © 2018 Elsevier Inc. All rights reserved.

2.16.1 2.16.2 2.16.2.1 2.16.2.2 2.16.3 2.16.3.1 2.16.3.1.1 2.16.3.1.2 2.16.3.1.3 2.16.3.2 2.16.3.3 2.16.3.3.1 2.16.3.3.2 2.16.3.3.3 2.16.4 References

2.16.1

Introduction Background A Newly Emerged Data Source Type Placemaking Measures for Human Activities Method and Result Data Collection API and JavaScript object notation R Extracting information from Twitter API using R script Point Visualization Vibrant Center Analysis Spatial join for point count Kernel density 100 m ﬁshnet hotspots Discussion

353 354 354 354 355 355 355 355 355 357 359 361 361 363 365 369

Introduction

Since Jane Jacobs and William Whyte, the use of physical space to promote human activity has always been the central objective of the placemaking ﬁeld. Observing the spatial distribution patterns of human activities is the essential way to explore this topic. The results found help planners come up with more precise and informative decisions for urban planning projects. For instance, which store in the street attracts more consumers? Where do people like to walk outside their home? Which corner of the plaza has more people lingering? Which place draws more attention from tourists? Where in the city would more people like to hang out at a bar at night? More importantly, does any physical urban form feature contribute to these facts? To answer all these placemaking questions, obtaining information on human activity patterns is always an essential key. However, traditionally the collection and analysis of human activity locations in space have been difﬁcult and labor intensive, requiring large numbers of investigators staying outside and remaining highly focused in order to record the location of human acitivities in space on a map. To decrease this requirement, placemaking research has always associated with the development of new technology and new data environments. In the late 1970s, Whyte (1980) used time-lapse photography techniques to record the location of people in a New York public space, laying the foundation for the placemaking discipline. Since the 1990s, as a robust method for spatial data mining, geographic information systems (GISs) have been more and more widely used to improve decisions in the placemaking process. Now with the rise of big data and the open data age, it is very exciting to see the huge new potential coming in this ﬁeld. On the other hand, human activity patterns themselves also change with the development of technology, so the old theories might need to be updated. In the last 30 years, the internet and social media revolution has effectively helped people connect more closely with each other. The revolution of the mobile device and wireless network technologies is building connectivity between people over social media websites. A recent report (Ofcom, 2015) shows that internet users aged 16 and above claimed to spend over 20 h and 30 min online each week in 2015. While in 2005, the number was only 9 h and 54 min. Another report from Global Web Index (2015) shows a typical internet user is now spending 1.77 h on social networking every day. In 2005, the word “social network” was rarely known by people. Therefore, a new type of activity has quickly emerged and has become a critical part of our life that no ﬁelds should ignore, especially placemaking. In this article, I introduce a set of newly emerged methods to analyze the spatial distribution of human activity using social media data to identify the most vibrant place in the city. There are three major parts of this analysis: 1. Social media data collection. 2. Point visualization. 3. Vibrant center identiﬁcation. For the ﬁrst part, using Twitter as an example, I mainly discuss a new data collection techniquedApplication Program Interface (API)dand show how to use it with the R program to set a simple script, which can automatically collect tweets data in real time. For the second part, after collecting all the data, the next task is to visualize it in order to show the spatial distribution of tweets on the map based on their coordinates. A simple and straightforward point visualization method is introduced here using ESRI ARCGIS 10.4 software.

353

354

GIS and Placemaking Using Social Media Data

For the third part, data is analyzed to reveal the location of the most popular and vibrant area in the city using ESRI ARCGIS 10.4 software. There are three different methods discussed here: (1) a spatial join method to connect this point data with other polygon datasets such as census block group boundary to get the count of the tweet in each block; (2) calculating the kernel density of the tweet point spatial distribution; (3) compiling a hot-spot analysis based on the tweet count in a 100-m based ﬁshnet. The open data age of social media companies offers an excellent opportunity to show how people spatially cluster in a city, an opportunity that the traditional human-based manual collection method is unable to offer. Also, with the help of R and GIS, all the analysis introduced in this article is entirely replicable to other places using point data of human activity.

2.16.2

Background

2.16.2.1

A Newly Emerged Data Source Type

Modern smartphones with their location-sensing ability from GPS can automatically identify a user’s current location and help users in writing their Twitter messages, the so-called “tweets,” with geo-tagged information based on a precise form of location (such as latitude and longitude). The ease of location-based social networking services represented by Twitter has evidently started a new popular trend of sharing and communicating from all over the world, which is usually referred to as “Web 2.0” (O’Reilly, 2009). Many ﬁelds have noticed this new behavior and the considerable potential of its data to transform the traditional analysis; some geography scholars refer to it as “volunteered geographical information” (Goodchild, 2007). Since these capabilities have emerged, more and more quantitative studies have started to assume or indicate that the social network behavior in the virtual world is related to our activities in real life. Lee and Sumiya (2010) made an early attempt in which they developed a geosocial event detection system to monitor unusual crowd events via Twitter data in Japan. In recent years, the studies on this relationship have become more and more popular and diversiﬁed, and proved that social media data can be used as a reliable source for human behavior research from different perspectives. Crooks et al. (2013) analyzed the spatial and temporal characteristics of the tweets responding to an earthquake which occurred on the US East Coast. They found Twitter data allows for the identiﬁcation and localization of the impact area of the event. Rösler and Liebig (2013) propose a segmentation method of a city into clusters based on activity proﬁles using geotagged check-in data from Foursquare. The result returns bright patterns separating areas known for different activities such as nightlife or daily work. Hawelka et al. (2014) analyzes geo-located Twitter messages to uncover global patterns of human mobility. They found Twitter data is “exceptionally useful” for understanding and quantifying global mobility patterns. Widener and Li (2014) found a signiﬁcant association between the content of geo-tagged tweets about unhealthy foods and the income status of the local area. Malleson and Andresen (2014) believe Twitter data is a useful source to measure the mobile population at risk, considering violent crime. Steiger et al. (2015) ﬁnd tweets show characteristic spatiotemporal semantic frequencies indicating human activities by correlating Twitter with UK census data. The potential of using social media data to study human activity has also raised attention in the planning ﬁeld. Evans-Cowley and Grifﬁn (2011) attempted to analyze information posted on Twitter to facilitate the public participation in the Strategic Transportation Mobility Plan (STMP) of Austin, Texas. Schweitzer (2014) examines the social media content sentiment about public transit from Twitter and how the interaction on Twitter will inﬂuence the tone of public comments. Lansley and Longley (2016) classiﬁed geotagged tweets from Inner London, provided geotemporal Twitter proﬁles of behavior and social attitudes. Nadai et al. (2016) used mobile phone data to test the four conditions of Jane Jacobs’s (Jacobs, 1961) theory in six Italian cities, ﬁnding those conditions to be indeed associated with urban life.

2.16.2.2

Placemaking Measures for Human Activities

On the other hand, if we can zoom out the scope to treat geo-tagged tweets as a general human behavior, the urban planning ﬁeld has conducted numerous studies on the relationship between human activity and quality of urban form to support our research questions. Jacobs (1961) ﬁrst noticed this issue. By challenging the tradition paradigm in planning, she proposes mixed-used function, small block size, various building sizes and population density areas essential for having a lively place. These ideas later turn to the foundation of “Placemaking” theories. Whyte (1980) took a further step using time-lapse shooting techniques to record and analyze the social life in the public space of New York. He found crowds served as one major attraction for public space. Also, water, trees, and food have a positive impact. Lynch (1984) proposed a normative theory framework for a good urban form quality that included vitality, sense, ﬁt, access, control, efﬁciency, and justice. Gehl (1987) took a deeper analysis of the relationship by categorizing activities into three subtypes: the necessary activities, the optional activities, and the social activities. He proposes optional and social activities should be more inﬂuenced by the built environment compared with necessary activities, which are more or less compulsory. Montgomery (1998) summarized the previous literature and proposed 12 physical conditions of good placemaking. Mehta (2007)) applied a similar method to Whyte’s for the street life in Boston and found people are equally concerned with the social, land use, and physical aspects of the street. In recent years, other planning studies have reviewed the relationship from the travel behavior or physical activity perspective. Cervero and Kockelman (1997) ﬁrst examined the connection between the density, diversity, density factors of the built environment and travel demand. This framework was reﬁned by many later researches, adding destination accessibility, distance to transit and demographic factors. It ﬁnally became the “6Ds” theory, which has been supported by more than 50 studies

GIS and Placemaking Using Social Media Data

355

(Ewing and Cervero, 2010). Ewing et al. (2015) measured 20 streetscape features for 588 blocks in New York City. They found a positive relation between 3 out of 20 streetscape features with pedestrian counts. Susan Handy et al. (2002) provided a theoretical framework of how built environment would inﬂuence physical activity patterns that has been proved by later research (Brownson et al., 2009; Ewing et al., 2008; Hoehner et al., 2005) However, traditional survey data sets limitations on such research topics. First, the self-reported data that has been frequently used to measure human behavior usually has reliability issues due to memory issues. As Cook and Campbell (1979) pointed out, people tend to report what they believe the researcher expects to see, or report what reﬂects positively on their own abilities, knowledge, beliefs, or opinions. Another concern about such data centers on whether subjects are able to accurately recall past behaviors. Cognitive psychologists have warned that the human memory is fallible (Schacter, 1999), and thus the reliability of self-reported data is tenuous. The user-generated content of social media can also be considered as a special type of self-reported data, but the information is recorded immediately by the service provider after it is produced, therefore it won’t be affected by the purpose of the researcher or human memory compared with traditional data. For observational data, to record the activity location, investigators have to survey, observe and record manually on site in order to mark all types of behavior. Whyte (1980) tried to improve the process by time-lapse photography, but it still needed people to manually detect and analyze all the pictures. The scale and time range of research are limited to the daytime and the community scale at most. Travel and physical activities are more easily collected by letting the respondent recall the frequency. However, both types of behavior lack of the social activity content, which is also one major type of human behavior, especially in public spaces. The social media data source can ﬁx the deﬁcits of the previous methods. It allows us to collect tweets from all around the world, 24-7, continuously, and with detailed location information. It is also an optional and social activity by Gehl’s (1987) deﬁnition, and therefore could offer us more understanding of how to make a lively place.

2.16.3

Method and Result

2.16.3.1

Data Collection

2.16.3.1.1

API and JavaScript object notation

Although today a huge amount of data is now open and available to the public, most of the data are stored in cloud servers with no direct download link or webpage provided. The extraction of such data is based on the protocol called Web application programming interface (Web API), and the data format that is used to transfer the data is called JavaScript Object Notation (JSON). Generally, a web API is a set of protocols that specify how our own computer (client) communicates with the data server to receive data. The command that the client sends to the server is called the “request”and the data replied from the server is called the “response.” As mentioned, the data structure that is used to transfer on the internet is called JSON. JSON is a lightweight data-interchange format. The merits of such a format are not only that it is easy for humans to read and write, but also that it is not hard for machines to parse and generate. Moreover, it is very ﬂexible to store various data sets with a consistent format. Therefore, in our example, the Twitter data also need to use the API of the Twitter server to extract.

2.16.3.1.2

R

R is one of the most popular data mining programs used today. It is a GNU project, which is like the S language and environment which was created at Bell Laboratories by John Chambers and colleagues. R is a free software environment for statistical computing and graphics. It can compile and run on a wide variety of UNIX platforms, Windows, and MacOS. Moreover, R provides a broad range of data-mining techniques and is highly extensible by the numerous packages offered. Therefore, I chose R to write the code for the Twitter API data scraping with its package called “StreamR.”

2.16.3.1.3

Extracting information from Twitter API using R script

There are two main Web API systems that we can get data from: Streaming APIs and REST APIs. The Streaming APIs allow developers to have a low latency access to Twitter’s global stream of tweet data. A proper implementation of a streaming client will be pushed messages indicating tweets and other events have occurred. If the purpose is to conduct particular searches, read user proﬁles, or post tweets, it is better to consider using the REST APIs instead. In this case, I will show how to use the Streaming APIs to get the data. To connect with the Twitter Streaming API, ﬁrst we need to have a consumer key and secret from Twitter themselves. Go to the website https://apps.twitter.com/. It is very easy to get this by going to Twitter and creating new applications. Fig. 1 shows what the page looks like. Enter the name, brief description, and a website (you can use a blog or a placeholder), and after you agree it will give a screen like that in Fig. 2, where you get the consumer key and secret key. Then we can start writing the code to extract the data. The code following is to get the tweets in Chicago city from Twitter Stream API for 1 hour and save the data into a csv ﬁle called “Chicago.csv”: install.packages(streamR) library(streamR)

356

GIS and Placemaking Using Social Media Data

Fig. 1

Application to create an app.

Fig. 2

Get keys from apps setting.

GIS and Placemaking Using Social Media Data

357

library(ROAuth) reqURL <- “https://api.twitter.com/oauth/request_token” accessURL <- “https://api.twitter.com/oauth/access_token” authURL <- “https://api.twitter.com/oauth/authorize” consumerKey <- “xxxxxyyyyyzzzzzz” consumerSecret <- “xxxxxxyyyyyzzzzzzz111111222222” my_oauth <- OAuthFactory$new(consumerKey¼consumerKey, consumerSecret¼consumerSecret, requestURL¼requestURL, accessURL¼accessURL, authURL¼authURL) my_oauth$handshake(cainfo¼system.ﬁle(”CurlSSL”, “cacert.pem“, package¼”RCurl“)) ## Alternatively, it is also possible to create a token without the handshake: accessToken <- ‘zzzzzzzzzzzzzzzzhhhhhhh’ accessTokenSecret <- ‘1234567aaa’ my_oauth <- createOAUthToken(consumerKey, consumerSecret, accessToken, accessTokenSecret) tweets <- ﬁlterStream(ﬁle.name¼”“, locations¼c(-74,40,-73,41), timeout¼3600, oauth¼my_oauth ) tweets.df <- parseTweets(tweets) write.csv(tweet.df,”Chicago.csv“)

2.16.3.2

Point Visualization

The visualization of tweets in the map is conducted on ArcGIS 10.4 version. Use the techniques discussed in the last section. I recorded the tweets in the Chicago downtown area for two months and stored the data in a csv ﬁle. In order to conduct the analysis, ﬁrst, we need to transform the coordinate information in the data we just collected into an ArcGIS shapeﬁle. The step is shown as follows: Open ArcMap, then use the add data button in the toolbar to add the table “Chicagodata.csv” into ArcMap. This step will import the data into software. After importing the data, the ArcMap still needs to recognize the attribute of each tweet location to transfer the number into coordinates of the points. To complete this, select the “File” option in the menu, click “Add Data >>> Add XY Data” as shown in Fig. 3. In the setting, X Field stands for the longitude number, and Y ﬁeld stands for the latitude number. Choose the corresponding attributes for those two ﬁelds. Also, each coordinate data has a unique “Coordinate System” that has to be correctly deﬁned in ArcGIS. For the tweet data we collected, the system used here is called World Geodetic System 1984 (WGS 1984). Click the “Edit.” button and then choose the geographic coordinate system at “Geographic coordinate system >>> World >>> WGS_1984.” (Fig. 4).

Fig. 3

Add data with coordinates.

358

GIS and Placemaking Using Social Media Data

Fig. 4

The setting for add XY data.

Fig. 5

Primary visualization of tweet points.

After clicking “OK,” the data will be imported to the map, and it will have an initial visualization draft like Fig. 5 shows. Each point on the map represents one tweet in that location. To locate the points on the Chicago city map, we can add a base map in the background. Click “Add data >>> Add base map” to add one map style here; then the ArcGIS will automatically match all the tweet points to their corresponding location on the map. For most cases, the “Imagery” and “World Dark Grey Canvas Base” style is recommended. “Imagery” is based on a high-resolution remote sensing satellite image while “World Dark Grey Canvas Base” is more simple and abstract. The map with “Imagery” is shown as in Fig. 6.

GIS and Placemaking Using Social Media Data

Fig. 6

After adding the” imagery” base map.

Fig. 7

After changing the symbology and base map.

359

As shown in Fig. 6, most of the time the default style for the point symbol is not clear to see on the map, so we need to adjust the color and size of it in the symbology setting. Double click the symbol of the point layer to open the “Symbology Selector” window. After setting the color as yellow and size as one, then change the base map style to “World Dark Grey Canvas Base.” The distribution of tweets will become much clearer than before, as shown in Fig. 7. Then go to the “Layout View” to add the north arrow and scale bar to make the size of the map more understandable. The ﬁnished product is shown in Fig. 8. After this, the map can be saved and exported as a newly created shapeﬁle. Even without any further analysis, we can see that in the map the distribution of tweeting activity shows a strong spatial autocorrelation. Most of the tweets are clustered in the downtown region with a clear spatial pattern along the street network. Therefore, this can prove that the social media activity is associated with physical space attributes in the city and could be used as a proxy to observe citizens’ public life.

2.16.3.3

Vibrant Center Analysis

Although point visualization can provide a rough distribution of where all the tweets are located, there is one critical cartography issue for this method that makes it unreliable for analyzing the vibrancy of the citydoverlaps. Because the resolution of the image is

360

Fig. 8

GIS and Placemaking Using Social Media Data

Tweet points visualization for Chicago downtown.

limited, there are many points clustered together so that some of the points overlap with each other. For example, if there are a thousand tweets posted in a tiny area, the result on the map may look like only one or two dots. This issue will cause incorrect estimations of the intensity of tweeting behavior in the highly clustered tweeted place, and therefore will cause the vibrancy of such an area to be potentially underestimated in the analysis. To solve the issues, I have listed four methods to identify the vibrant center region in the city based on the data collected. They are in the table following. Each of them has its own advantages and disadvantages. Spatial join is a basic analysis that will help to show the count of points in each unit of area, which in this project is chosen as the Census Block Group polygon. It is simple but doesn’t consider the relative relation between each unit and its neighborhood, and therefore can’t show the region of spatial cluster. A relatively more complicated method is called the “Kernel Density measure”; it helps to show the spatial cluster, but it still suffers from certain potential accuracy issues. Therefore, the third methodd100-m ﬁshnet based hotspot analysisdis the most strict and precise tool to reveal the cluster of tweets in the map. In contrast, this method also requires very high computation power, which limits the

GIS and Placemaking Using Social Media Data

Table 1

361

The comparison of each analysis method

Method

Computation

Accuracy

Compare with Socioeconomic indicators

Point visualization Spatial join Kernel density 100-m ﬁshnet

Low Low Low High

Low Low Median High

Hard Easy Hard Hard

application. This method also can be applied to the spatial joined polygon data (census block group), which makes it very easy to compare it with other socioeconomic indicators at the same unit (Table 1).

2.16.3.3.1

Spatial join for point count

The spatial join tool is a GIS operation that connects data from one feature’s attribute table to another feature from a spatial perspective. In this case, we need to connect the tweet point feature to a polygon census block group boundary feature to get the count of tweet in each polygon (Fig. 9). A spatial join includes matching data from the join features to the target features. In this process all attributes of the join features are attached to attributes of the target features, then copied over to the output feature class. To perform the analysis in ArcGIS, Click the ArcToolbox button to open the list of analysis tools. Then select “Analysis Tools >>> Overlay >>> Spatial Join” to open the window of sptail join tool. The “Target Feature” would be the census block group polygon and the “Join Feature” would be the point feature of tweets. The “Join Operation” would be “Join One to One.” This means if multiple tweet points are found that have the same spatial relationship with a single census block group polygon, the attributes from the multiple points will be aggregated as a count value. The match option is “Contain,” which means the features in the join features (tweet point) will be matched if a target feature (census block group boundary) contains them. Then run the analysis to get the count of tweet points in each polygon (Fig. 10).

2.16.3.3.2

Kernel density

Kernel density can calculate a magnitude-per-unit area from point features using a kernel function to ﬁt a smoothly tapered surface to each point, therefore showing the location of highly clustered points in the map. To perform this analysis, we can open the ArcToolbox and select “Spatial Analyst Tools >>> Density >>> Kernal Density.” Then set the “Input point or polyline feature” as the tweet point shapeﬁle, and leave other ﬁelds as defaults. The primary result shows as Fig. 11. Although currently the result shows the cluster, it is obvious that the map is too unclear. To show more information from the result, in the “Symbology” tab of layer properties, we need to change the classiﬁcation method from “equal interval” to “natural

Fig. 9

The tweets and boundary of census block groups.

362

GIS and Placemaking Using Social Media Data

N 0 0.15 0.3

0.6

0.9

1.2 Miles

Legend chicagodata Tweet Count 1 - 83 84 - 196 197 - 361 362 - 668 669 - 1141 1142 - 1788 1789 - 2862 2863 - 4782 4783 - 16694 16695 - 39085

Fig. 10

The count of tweets in each census block group.

Fig. 11

Primary result of kernel density estimation.

GIS and Placemaking Using Social Media Data

Fig. 12

363

Result after change classiﬁcation method.

break.” Therefore each color of category will contain a similar amount of observations. The map will become as shown in Fig. 12, which reveals much more detail about the cluster distribution. To locate the cluster on the map, we can set the lowest value category as “no color” to make the background transparent. Then add a base map as “World Dark Grey Canvas Base” style. The ﬁnal product is shown in Fig. 13. This map illustrates how different levels of tweet clusters are distributed. Mainly, most tweeting activity happens in the downtown district with some low intensity clusters lying outside. Within the downtown region, there are a few locations with very intensive tweeting activity, which are the white points in the map. It’s a relatively easy and powerful method; however, its valuesmoothing method of estimation makes the result not very accurate. Sometimes the area with no possibility of having tweets might be evaluated as a highly clustered area, because it lies between another two-tweet cluster. Also, the size and location of the cluster usually is too rough to pinpoint. Therefore, if the requirement is only for a quick and conceptual analysis, kernel density would be appropriate. If, however, more precise results are needed, such as for research, the next method would be better.

2.16.3.3.3

100 m fishnet hotspots

The hotspot analysis can also tell where data is spatially clustered in the dataset and is more welcomed by academia. The hotspot analysis tool calculates the Getis-Ord Gi* statistic for each unit of a feature in a dataset, developed by Getis and Ord, to analyze evidence of spatial patterns (Getis and Ord, 1992; Ord and Getis, 2010). This has been widely used in urban studies for various topics such as crime (Craglia et al., 2000), land price (Paez et al., 2001), cultural industry (Currid and Williams, 2010) and urban sprawl (Hamidi et al., 2015). The resultant Z-scores and p-values will tell where features with either high or low values cluster spatially. For statistically signiﬁcant positive Z scores, the larger the Z score, the more intense the clustering of high values. This method works by looking at each feature within the context of neighboring features by using the kernel density function. To become a statistically signiﬁcant hotspot, the feature needs to have a high value and be surrounded by other features with high values as well. To conduct it, the local sum for a feature and its neighbors is compared proportionally to the sum of all features. If the calculated local sum is too different from the expected local sum to be the result of random chance, Z-score results would be statistically signiﬁcant. To perform the analysis, ﬁrst, we need to create a ﬁshnet for the boundary. This is because hotspot analysis cannot be performed directly on points. Therefore, it is critical to transfer the point pattern into a polygon grid with count number. Unlike the kernel density, hotspots also need to add a boundary shapeﬁle to deﬁne the edge of the analysis. Then to create the ﬁshnet with measurable units, the boundary coordinate system needs to be projected. Contrasting the WGS 1984 system, which is deﬁned as a sphere, a projected coordinate system is deﬁned on a ﬂat, two-dimensional surface. Open Arctoolbox, then select “Data Management Tools >>> Projections and Transformations >>> Project” Tool to conduct the transformation. Because the site is located in Chicago, therefore the setting needs to be “Projected coordinate system >>> State Planes >>> NAD 1983 (2011) (Meters)>>> NAD 1983(2011) StatePlane Illinois East FIPS 1201.” To make sure the units at the edge of the boundary have a normal value to avoid a “boundary effect,” the actual boundary shapeﬁle needs to be larger than the one deﬁned before. In this case, we take a 500-m buffer here. I used the “Geoprocessing >>> buffer” to make the edits (Fig. 14).

364

GIS and Placemaking Using Social Media Data

Fig. 13

Kernel density estimation of tweets in Chicago downtown.

Then I created the ﬁshnet using the tool “Data Management Tools >>> Feature Class >>> Create Fishnet.” The extent is the buffer shapeﬁle I just created. Cell size and width are both 100 (m) here. Uncheck the “Create Label Points” Box and Select Geometry as “Polygon.” The ﬁshnet then is created (Fig. 15). The ﬁshnet now is still in the projected coordinates system while the tweet point is in the WGS 1984 system set before. So, to spatial join the tweets to the ﬁshnet to get the count of tweet points in each grid, the ﬁshnet ﬁle needs to be projected back to WGS 1984, using “Data Management Tools >>> Projections and Transformations >>> Project.” The next step would be using “Analysis Tools >>> Overlay >>> Spatial Join” tools to get the count of points in each polygon as in “Spatial join for point count” section. Next is the most important step in the analysisdhotspot detection. I used “Spatial Statistics Tools >>> Mapping Clusters >>> Hot Spot Analysis (Getis-Ord Gi*)” in the Arctoolbox to perform the analysis. For the settings, the input ﬁeld is the count of tweets just calculated from the spatial join step. The conceptualization of spatial relationships is “INVERSE_DISTANCE.” Then the primary result would be computed (Fig. 16).

GIS and Placemaking Using Social Media Data

Fig. 14

365

Research boundary and boundary buffer of the analysis.

After getting the draft result, because I buffered the analysis range previously, now I need to trim the result back based on the original boundary. Use the “Select by location” tool to select the hotspot grids within the boundary. Then export it as a new ﬁle for further analysis. The primary hotspot analysis result is shown in the Fig. 17. As done for the kernel density result, I set the outline color and the “Not signiﬁcant” ﬁll color as transparent to make the map show up as background. Then I added the “Street” style base map. Therefore, I’m able to identify the location of each hotspot in an accurate way. Based on the map information, then I can go to Google Maps to see if there are some special places within those locations. From the result in Fig. 18, hotspots of tweet cluster in Chicago downtown, it shows that the most vibrant places in Chicago downtown are mainly for recreation and leisure activities. These places can be put into six categories: (1) green and social space including Millennium Park, Buckingham Fountain Flower Gardens, Maggie Daley Park and Columbus Plaza; (2) business landmark buildings such as Willis Tower and Hancock Tower and Chicago Board of Trade Building; (3) museums such as The Art Institute of Chicago, Shedd Aquarium, The Field Museum; (4) tourist destinations such as Heald Square Monument, City Hall and Navy Pier; (5) sports facilities, such as the Soldier Field; (6) recreation blocks, the bar streets around the Hubbard Street. This proved that the spatial analysis using social media data could reﬂect the activity of urban public life to illustrate which places are more welcomed by the public.

2.16.4

Discussion

Placemaking theories have been receiving more and more attention from other urban planning ﬁelds, including transportation, sociology, public health and economic development. In essence, the key that lies at the intersection of these areas is how to use

366

GIS and Placemaking Using Social Media Data

Fig. 15

The 100-m ﬁshnet.

physical space settings to promote certain human behavior, whether walking, communication, physical activity or consumption. Placemaking as a discipline mainly focuses on this topic. As proved by the example in this article, GIS techniques can further improve the potential of the placemaking ﬁeld in three major ways: l

Data collection. Data visualization. l Data analysis. l

The ﬁrst is data collection. I used the case of social media data scraping here. Since the 1960s, the placemaking ﬁeld has grown. However, after the dawning of the 21st century there have been few exciting new theories proposed. I think this stagnancy is mainly because the traditional data collection methodsfor human activity have limited the ways in which researchers can reexamine those theories in order to develop them further. The recent open data and big data trends happening in the information communication technology ﬁeld will help placemaking to break through this ceiling. With this background, the study sought to develop a method for collecting social media activity count data in a ﬂexible spatial scale to build an analysis model for placemaking decisions. A set of well-deﬁned spatial statistical models is also provided based on that. The most powerful advantage of this analysis is the data quality, which comes with high-resolution geographic information recorded 24-7 without interruption. More importantly, it is completely free and open data, soany analyst could perform this analysis for places in most of the large cities of the world. The most substantial issue for the researchers using social media data lies in its nature of volunteered submission, which could suffer from the problem of selection bias in the studies. In short, the tweet information reported with geo-location might not be

GIS and Placemaking Using Social Media Data

Fig. 16

367

The count of tweets in the ﬁshnet polygons.

enough to represent the Twitter user’s behavior. Also, Twitter users’ behavior might not be sufﬁcient to describe the general citizens’ behavior. Due to the lack of individual information for the Twitter user, these two gaps are indeed difﬁcult to merge and this leaves a big issue for validity in the research design. Due to the user feature of Twitter, I think the result of this analysis would be valid for a more urbanized area with younger population structure, while remaining questionable if applied to the more remote regions. The second way is data visualization. I used the point visualization process as a proof here. “A picture is worth a thousand words.” Currently the data visualization map is the most effective method for both understanding data and communicating results of human behavior data. For placemaking, it’s also the most suitable way to present the information. Without a map, the information collected will just become a list of chaotic numbers that is impossible to understand. In the era of big data, the enormous information currently created by various sensor devices requires much stronger tools to visualize than the

368

GIS and Placemaking Using Social Media Data

Fig. 17

The primary hotspot analysis result.

traditional MS Excel. It needs to automatically draw a large quantity of points based on the location information and be able to add a real-world basemap to help readers to quickly understand the location. Therefore, GIS techniques offer the most effective way to accomplish this task. The third way is the data analysis, especially for point data. In this article, I used the spatial point analysis as an example to identify the most vibrant places for tweeting behavior in Chicago downtown. As the big data age is rising today, the enormous amount of information produced by billions of people is no longer directly generalizable by human recognition. Sometimes a straightforward visualization is also not enough to reveal the truth behind scenes such as the case presented in this paper. The analysis methods of GIS techniques could become a set of valid tools that allow the researcher in the placemaking ﬁeld to quickly understand the meaning of the information and therefore improve the effect and efﬁciency of decisions. In addition, this research tries to lay a sound foundation for the future to build an analysis framework for human behavior in the placemaking ﬁeld. I believe there are four major potential directions for this topic to develop further: l

Scale Time l Content l Network l

The ﬁrst would be enlarging the geographical scale. Since technically the change of research area is only a matter of one parameter setting in the R script, it would not be complicated to transfer the framework in this paper into a study of a much wider area. The second would be time. As in temporal-space geography studies, time is also an important factor. With a certain time range as a ﬁlter, this study topic could be narrowed down to some speciﬁc urban issues. For instance, which areas in the city are holding nightlife or weekend life for residents? Is it revitalizing or disappearing? What factors might be associated with it? The third direction is to combine the spatial analysis with content analysis techniques. With both text and ﬁgures, now social media are carrying an enormous amount of information that could be mined for urban studies. Using machine learning or deep learning methods, we could detect the public’s perception and opinion of the city. It would be even more interesting to combine such information with the highresolution geo-tag and time. The fourth is the network relation between people. It includes the mobility network and communication network. We can analyze how people move based on their geo-tagged tweets and how people connect with each other based on the followers and follow their account. Then using the network clustering algorithms, we would be able to reveal the spatial structure of mobility and social community based on such information.

GIS and Placemaking Using Social Media Data

Fig. 18

369

Hotspots of tweet cluster in Chicago downtown.

References Brownson, R.C., Hoehner, C.M., Day, K., Forsyth, A., Sallis, J.F., 2009. Measuring the built environment for physical activity. State of the science. American Journal of Preventive Medicine 36 (4 (Suppl.)), S99–S123.e12. http://dx.doi.org/10.1016/j.amepre.2009.01.005. Cervero, R., Kockelman, K., 1997. Travel demand and the 3ds: Density, design and diversity. Transportation Research Part D: Transport and Environment 2 (3), 199–219. http:// dx.doi.org/10.1016/S1361-9209(97)00009-6. Cook, T.D., Campbell, D.T., 1979. Quasi-experimentation: Design and analysis issues in ﬁeld settings. Houghton Mifﬂin, Boston, MA. Craglia, M., Haining, R., Wiles, P., 2000. A comparative evaluation of approaches to urban crime pattern analysis. Urban Studies 37 (4), 711–729. http://dx.doi.org/10.1080/ 00420980050003982. Crooks, A., Croitoru, A., Stefanidis, A., Radzikowski, J., 2013. #Earthquake: Twitter as a distributed sensor system. Transactions in GIS 17 (1), 124–147. http://dx.doi.org/10.1111/j.14679671.2012.01359.x. Currid, E., Williams, S., 2010. Two cities, ﬁve industries: Similarities and differences within and between cultural industries in New York and Los Angeles. Journal of Planning Education and Research 29 (3), 322–335. http://dx.doi.org/10.1177/0739456X09358559. Evans-Cowley, J.S., Grifﬁn, G., 2011. Micro-participation: The role of microblogging in planning. SSRN Electronic Journal 1–38. http://dx.doi.org/10.2139/ssrn.1760522.

370

GIS and Placemaking Using Social Media Data

Ewing, R., Cervero, R., 2010. Travel and the built environment. Journal of the American Planning Association 76, 265–294. http://dx.doi.org/10.1080/01944361003766766. Ewing, R., Hajrasouliha, A., Neckerman, K.M., Purciel-Hill, M., Greene, W., 2015. Streetscape features related to pedestrian activity. Journal of Planning Education and Research 36 (1), 5–15. http://dx.doi.org/10.1177/0739456X15591585. Ewing, R., Schmid, T., Killingsworth, R., Zlot, A., Raudenbush, S., 2008. Relationship between urban sprawl and physical activity, obesity, and morbidity. Urban Ecology 567–582. http://dx.doi.org/10.1007/978-0-387-73412-5_37. Gehl, J., 1987. Life between buildings: Using public space. Island Press, Washington, DC (The City Reade). Getis, A., Ord, J.K., 1992. The analysis of spatial association. Geographical Analysis 24 (3), 189–206. http://dx.doi.org/10.1111/j.1538-4632.1992.tb00261.x. Goodchild, M.F., 2007. Citizens as sensors: The world of volunteered geography. GeoJournal 69 (4), 211–221. http://dx.doi.org/10.1007/s10708-007-9111-y. Hamidi, S., Ewing, R., Preuss, I., Dodds, A., 2015. Measuring sprawl and its impacts: An update. Journal of Planning Education and Research 35 (1), 35–50. http://dx.doi.org/ 10.1177/0739456X14565247. Handy, S.L., Boarnet, M.G., Ewing, R., Killingsworth, R.E., 2002. How the built environment affects physical activity: Views from urban planning. American Journal of Preventive Medicine 23 (2 (Suppl. 1)), 64–73. http://dx.doi.org/10.1016/S0749-3797(02)00475-0. Hawelka, B., Sitko, I., Beinat, E., Sobolevsky, S., Kazakopoulos, P., Ratti, C., 2014. Geo-located Twitter as proxy for global mobility patterns. Cartography and Geographic Information Science 1–12. http://dx.doi.org/10.1080/15230406.2014.890072. Hoehner, C.M., Brennan Ramirez, L.K., Elliott, M.B., Handy, S.L., Brownson, R.C., 2005. Perceived and objective environmental measures and physical activity among urban adults. American Journal of Preventive Medicine 28, 105–116. http://dx.doi.org/10.1016/j.amepre.2004.10.023. Jacobs J (1961) The death and life of Great American cities. New York 71. doi:10.2307/794509. Lansley, G., Longley, P.A., 2016. The geography of Twitter topics in London. Ceus 58, 85–96. http://dx.doi.org/10.1016/j.compenvurbsys.2016.04.002. Lee, R., Sumiya, K., 2010. Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social NetworksACM Press, New York, NY, pp. 1–10. http://dx.doi.org/10.1145/1867699.1867701. Lynch K (1984) Good city form. City Malleson, N., Andresen, M.A., 2014. The impact of using social media data in crime rate calculations: Shifting hot spots and changing spatial patterns. Cartography and Geographic Information Science 406, 1–10. http://dx.doi.org/10.1080/15230406.2014.905756. Mehta, V., 2007. Lively streets: Determining environmental characteristics to support social behavior. Journal of Planning Education and Research 27 (2), 165–187. http://dx.doi.org/ 10.1177/0739456X07307947. Montgomery, J., 1998. Making a city: Urbanity, vitality and urban design. Journal of Urban Design 3 (1), 93–116. http://dx.doi.org/10.1080/13574809808724418. Nadai, M.D., Quercia, D., Larcher, R., Lepri, B., 2016. The death and life of Great Italian Cities: A mobile phone data perspective categories and subject descriptors, (iv). Computers and Society; Physics and Society. http://dx.doi.org/10.1145/2872427.2883084. Ofcom, U.K., 2015. Communications market report. UK Mobile Phone Usage Statistics, London, UK, Chicago. O’Reilly T (2009) [ebook]What is web 2.0. O’Reilly Media, Inc. Ord, J.K., Getis, A., 2010. Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis 27 (4), 286–306. http://dx.doi.org/10.1111/ j.1538-4632.1995.tb00912.x. Paez, A., Uchida, T., Miyamoto, K., 2001. Spatial association and heterogeneity issues in land price models. Urban Studies 38 (9), 1493–1508. http://dx.doi.org/10.1080/ 0042098012007676. Rösler, R., Liebig, T., 2013. Using data from location based social networks for urban activity clustering. In: Geographic information science at the heart of Europe. Springer International Publishing, pp. 55–72. Schacter, D.L., 1999. The seven sins of memory: Insights from psychology and cognitive neuroscience. American Psychologist 54 (3), 182. http://dx.doi.org/10.1037/0003066X.54.3.182. Schweitzer, L., 2014. Planning and social media: A case study of public transit and stigma on Twitter. Journal of the American Planning Association 80 (3), 218–238 . http://dx.doi.org/10.1080/01944363.2014.980439. Steiger, E., Westerholt, R., Resch, B., Zipf, A., 2015. Twitter as an indicator for whereabouts of people? Correlating Twitter with UK census data. Computers, Environment and Urban Systems 54, 255–265. http://dx.doi.org/10.1016/j.compenvurbsys.2015.09.007. Whyte, W.H., 1980. The social life of small urban spaces. Common Ground 10. http://dx.doi.org/10.1177/089124168201000411. Widener, M.J., Li, W., 2014. Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US. Applied Geography 54, 189–197. http://dx.doi.org/10.1016/j.apgeog.2014.07.017.

GIS and Placemaking Using Social Media Data

GIS and Placemaking Using Social Media Data

Recommend Documents