Computers, Environment and Urban Systems 53 (2015) 1–3
Contents lists available at ScienceDirect
Computers, Environment and Urban Systems journal homepage: www.elsevier.com/locate/compenvurbsys
Volunteered Geographic Information: Towards the establishment of a new paradigm Bin Jiang a, Jean-Claude Thill b a b
Division of Geomatics, University of Gävle, Sweden Dept. of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC 28269, USA
a r t i c l e
i n f o
a b s t r a c t
Article history: Received 18 March 2014 Received in revised form 9 January 2015 Accepted 10 January 2015
Volunteered Geographic Information (VGI) is user-generated content that is assorted with spatial coordinates. This position paper places VGI in the broader context of data sciences, underscoring its most critical properties, identifies its contribution to the emergence of new social science analytics for urban built environments, and presents some of the remaining challenges. Ó 2015 Published by Elsevier Ltd.
What motived us to edit this special issue of Computers, Environment and Urban Systems is that a large amount of volunteered geographic information (VGI), created by volunteers through crowdsourcing, represents a new phenomenon arising from a whole host of Web 2.0 technologies (Goodchild 2007; Sui, Elwood, & Goodchild 2013), ranging from social media services, blogs, wikis, and others. VGI constitutes one of the most important types of user-generated content, quickly becoming a new type of asserted geographic information, complementing the traditional authoritative geographic information collected by governmental agencies or private organizations. In the context of this special issue, VGI is broadly defined, referring to any georeferenced information that is freely distributable without copyright constraints for sharing among those interested. From this perspective, VGI, such as Flickr photos and their geotags, digital tickets and travel cards, tweets and their locations, cell phone check-ins, is more than just a new type of data. It establishes a new paradigm for socio-spatial research and the continuous monitoring of the changing landscape of behaviors, opinions, attitudes, and social interactions in fast evolving urbanized societies. In this editorial, we set to discuss some of our personal reflections on VGI, e.g., what are its ‘big data’ properties and how does it differ from small data? What are the fundamental analytical technologies applicable to it and the newly emerging challenges embedded in VGI? How does
E-mail addresses: (J.-C. Thill)
[email protected]
(B.
Jiang),
[email protected]
http://dx.doi.org/10.1016/j.compenvurbsys.2015.09.011 0198-9715/Ó 2015 Published by Elsevier Ltd.
VGI open a new pathway to understanding in social and behavioral sciences? As will become clear through this position paper, the discourse on VGI has evolved considerably over the past few years and VGI cannot be conceived in isolation from other concepts of data science. It is now well-accepted (e.g., Mayer-Schonberger & Cukier 2013) that big data can be characterized by several Vs such as volume, variety, velocity, and veracity (Laney D., 2001). Big data has an enormous impact on science, and social science in particular (Lazer et al. 2009; Watts 2007). Geographic information constitutes a very important type of big data, thanks to geospatial technologies such as the global positioning system (GPS), geographic information systems (GIS), and remote sensing. Geospatial technologies have dramatically enabled social media, the Internet, and mobile devices, and consequently, large volumes of georeferenced information about things and human beings are available for better understanding environment and urban systems. Big data differs fundamentally from small data in terms of data characteristics. Small data are mainly sampled (e.g., census or statistical data), while big data are automatically harvested, e.g., using crawling techniques or application programming interface provided by social media providers and information management systems, from a large population of users or crowds. Given the large population, big data are often referred to as ‘all’ rather than a small part of it; in other words, small data is like an elephant seen by a blind man, while big data is an elephant itself. Small data, such as census data, are essentially estimated and shared as an aggregate, while big data such as VGI and social media data are available at the elemental scale, possibly measured at very fine spatial and temporal
2
B. Jiang, J.-C. Thill / Computers, Environment and Urban Systems 53 (2015) 1–3
resolutions. These three distinguishing characteristics – all, measured, and individual, make big data big, and differentiate them from their small data counterpart. This in turn opens new opportunities, at the same time as it brings out new challenges. Big data differs fundamentally from small data in terms of data analytics. Considering their distributional properties, georeferenced features would typically be very diverse and heterogeneous. Hence, they are likely to demonstrate a heavy tailed distribution (Anderson 2006) that lacks of a characteristic mean, a property shared by scale-free systems. A Paretian way of thinking that is consistent with power-law statistics is more appropriate for dealing with such heterogeneity (Jiang 2015). By contrast, in the small data era, a Gaussian way of thinking, or Gaussian statistics, has been widely adopted based on the assumption that features can be characterized by a welldefined mean. Researchers ignoring power-law effects and relying on a wrong data theory risk drawing false conclusions and missing interesting relationships embedded in the data. The growing ubiquity of diversity and heterogeneity in social systems refocuses interest on opinions and events that are rare. Gaussian statistics may dismiss them as outliers, mischaracterize them or even miss them altogether as part of small data sampling schemes. Although it would be evidence-based research, it would constitute a clear case of failure of the scientific method and its criticality would be tremendous in domains such as disease epidemiology, emergency management and response, crime analysis and traffic safety. By design, VGI may be collected repeatedly or continuously. Multiple observations may be available for the same objects or individuals under various and evolving circumstances. While conventional evidence-based research is either cross-sectional or exploits sparse time-series, VGI analytics can probe the dynamical properties of social systems. GPS tracks associated with taxi probes and cell phone check-ins are good examples of new datasets that permit to survey at a fine space–time resolution. Here, the georeferenced data in question are not of the traditional variety (point or polygon); they are relational. Each track is indicative of the functional relationship that exists between a series of locations, as experienced by an individual or agent. Thus, each track is a functional footprint of the urban space. Their mining reveals the pulse of an urban social system as it responds to external stimuli across spatial and temporal scales. The Internet of Things that is touted as the inescapable evolution of the pervasive embedding of information and communication technologies in organic and inorganic entities (from household items to food items, from organs to animals, from vehicles to parcels) will amplify the relevance of urban informatics towards the emergence of smart cities. Thus, ‘‘social sensing” (Liu et al. 2015) currently powered by various forms of VGI will converge towards the unifying concept of Pervasive Digital Earth across scales of the built and natural environments. VGI that originates in social media may encompass a rich array of textual information posted by the originator of the communication. This information may be difficult to decypher due to the restrictions specific to each media platform or to the circumstances of the exchange. While much of the information that is traditionally georeferenced is observational, i.e., the data is a fairly objective recording of an event, an action, or a feature property, social media have emerged as an effective and widely popular platform where views, opinions, feelings, and emotions are shared with others. This singular capability sets it apart from
other VGI sources. Social media have placed within the easy reach of a broad base of social scientists data that prior to the advent of Web 2.0 was rather exclusive. Semantic analysis of textual data posted on social media is different from other polling and opinion survey techniques. It is based on unstructured data streamed directly from the media platform rather and extracted from carefully designed survey questionnaires. The information is unsolicited, which may suggest that it is free of certain systematic biases, such as biases that may arise from protest zeros and respondents’ tendency to under-report their willingness to pay through more taxes for an expansion of public services. On the other hand, as the data is not collected through a carefully controlled experiment, the circumstances that shape the opinions and emotions cannot readily be traced. Opinions shared on social media is known to often be emotional, reactive, sometimes designed to be provocative. Thus, ‘‘sentiment analysis” is still a field in the making; the fact of the matter is that opinions and views can be deeply shaped by exposure to social media contents. In spite of these recognized challenges, sentiment analysis profoundly enriches social science research by affording the study of opinions across finer segments of individuals (especially on the basis of their lived experiences and their geographic settings) as well as the stationarity of opinions, views, and sentiments in the context of media exposure, or conversely their sharp fluctuations in response to dramatic events (war, riots, natural disaster, etc.). All things considered, VGI is instrumental to the emergence of social science analytics that is more sensitive to the socio-spatial setting of each human being and to the changing nature of sentiments. The multiplication of channels delivering VGI is only starting to be grappled with in spatial data science. Various media have their own market niche. Just like not everyone is a heavy cell phone user, not everyone holds a Tweeter account, and not everyone hires taxis. The large volume of data that may be available from each individual source may give the false confidence that derived research is robust and representative. On the contrary, communication technology markets are segmented by age, geographies, gender, socio-economic status, and so on. Therefore, market segmentation suggests that selection biases are pervasive and generalizable conclusions can only be held through the fusion of data from various sources, whether they exhibit big data properties (like VGI) or a mix of big data and small data properties, like conventional data sources. From a geospatial perspective, this fusion enables to more fully grasp the deeply textured nature of spaces constructed through social interactions between socio-economic agents, whether on the basis of conventional transportation technologies or of more cutting-edge communication technologies, as advanced by Thill (2011).
References Anderson, C. (2006). The Long Tail: Why the Future of Business Is Selling Less of More. New York: Hyperion. Goodchild, M. F. (2007). Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4), 211–221. Jiang, B. (2015). Geospatial analysis requires a different way of thinking: The problem of spatial heterogeneity. GeoJournal, 80(1), 1–13. Laney D. (2001), 3D Data Management: Controlling Data Volume, Velocity and Variety, META Group, Retrieved 30 March 2015. Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., ... Van Alstyne, M. (2009). Computation social science. Science, 323, 721–724. Liu, Y., Liu, X., Gao, S., Gong, L., Kang, C., Zhi, Y., ... Shi, L. (2015). Social sensing: A new approach to understanding our socioeconomic environments. Annals of the Association of American Geographers, 105(3), 512–530.
B. Jiang, J.-C. Thill / Computers, Environment and Urban Systems 53 (2015) 1–3 Mayer-Schonberger, V., & Cukier, K. (2013). Big Data: A revolution that will transform how we live, work, and think. New York: Eamon Dolan/Houghton Mifflin Harcourt. Sui, D., Elwood, S., & Goodchild, M. (2013). Crowdsourcing Geographic Knowledge: Volunteered Geographic Information (VGI) in Theory and Practice. Berlin: Springer.
3
Thill, J.-C. (2011). Is spatial really that special? A tale of spaces. In V. Popovich, C. Claramunt, T. Devogele, M. Schrenk, & K. Korolenko (Eds.), Information fusion and geographic information systems: Towards the digital ocean (Lecture notes in geoinformation and cartography) (pp. 3–11). Heidelberg: Springer. Watts, D. (2007). A twenty-first century science. Nature, 445, 489.