Geospatial Metadata 2.0 – An approach for Volunteered Geographic Information

Geospatial Metadata 2.0 – An approach for Volunteered Geographic Information

Computers, Environment and Urban Systems 48 (2014) 35–48 Contents lists available at ScienceDirect Computers, Environment and Urban Systems journal ...

5MB Sizes 0 Downloads 59 Views

Computers, Environment and Urban Systems 48 (2014) 35–48

Contents lists available at ScienceDirect

Computers, Environment and Urban Systems journal homepage: www.elsevier.com/locate/compenvurbsys

Geospatial Metadata 2.0 – An approach for Volunteered Geographic Information Mohsen Kalantari ⇑, Abbas Rajabifard, Hamed Olfat, Ian Williamson Centre for SDI and Land Administration, Department of Infrastructure Engineering, The University of Melbourne, Australia

a r t i c l e

i n f o

Article history: Received 9 April 2013 Received in revised form 23 June 2014 Accepted 26 June 2014

Keywords: Geospatial data Metadata VGI Automation Tagging Folksonomy Web 2.0 Crowdsourcing Linked data

a b s t r a c t There has been greater tendency towards embracing the potential of the Web 2.0 in knowledge creation with user contributions becoming significantly important in the development of open data such as Volunteered Geographic Information (VGI). An increasing number of volunteers which add value to geospatial data require effective ways of organising and providing access to value-added or newly created data. Metadata records are often unavailable for VGI, making it problematic for users to discover data in the VGI system. This paper discusses potential approaches that can be used to create metadata for VGI. This includes Linked Data, professionally created metadata and metadata created by volunteers. The paper then discusses the shortcomings of these approaches and proposes an unconventional alternative, Geospatial Metadata 2.0. This approach involves VGI users in the creation of metadata, and builds on folksonomies created by them. The paper presents the design and implementation of this idea in a prototype system along with an assessment result. The implementation includes Geospatial Metadata 2.0 addons in Geonetwork, and the assessment benefits from the input of metadata experts worldwide. The assessment of the benefits results indicate such an approach of metadata creation is promising and has the potential to be employed in VGI systems. Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction The Web is a powerful tool in collecting data and creating knowledge. Linking Web pages to one another, websites offering the collective work of Internet users, search engines using network characteristics of the Web rather than just content of Web documents, and user engagement in buying and selling are examples that the Web environment collates data from everywhere and everyone (O’Reilly, 2007). In other words, user contributions have become significantly important in the development of the World Wide Web (WWW). The rise of Web usage as part of everyday life affects Spatial Data Infrastructures (SDIs) as well. SDIs are most commonly initiated and coordinated by government authorities and provide authoritative geospatial information to users. The data preparation within SDIs is usually managed in a process where the quality of geospatial data is rigorously examined before it is made available to the public. However, developments in VGI are challenging SDIs (Corcoran, Mooney, & Bertolotto, 2013). Geospatial information ⇑ Corresponding author. Tel.: +61 383440274. E-mail addresses: [email protected] (M. Kalantari), [email protected]. au (A. Rajabifard), [email protected] (H. Olfat), [email protected] (I. Williamson). http://dx.doi.org/10.1016/j.compenvurbsys.2014.06.005 0198-9715/Ó 2014 Elsevier Ltd. All rights reserved.

such as road names, addresses and parcel maps are being created in initiatives such as OpenStreetMap (OSM), Wikimapia, and Google My Map. These initiatives enable the collective intelligence of volunteers to provide geospatial data to a wider group (Coleman, Geogiadou, & Labonte, 2009; Coleman, Sabone, & Nkhwanana, 2010; Goodchild, 2007). As Web-connected societies increasingly rely on VGI, structures for more effective integration have been identified by scholars. (Beard, 2012) suggests organising VGI around features rather than layers, creating support for multiple representations that capture different contributor perspectives, and integration around the concept of place. Developments in VGI have created a semantic gap between unstructured geospatial datasets in VGI and high-level ontological concepts (Ballatore & Bertolotto, 2011). Specifically, this means that data provided by volunteers often lack metadata, making it difficult to determine how the data was created, when, by whom, and therefore questioning its fitness for use (Mooney & Corcoran, 2012). An example of the lack of metadata in VGI is illustrated in a search for the Royal Women’s Hospital in Melbourne: Wikimapia returns four different entries, of which none are acceptable (Fig. 1). Wikimapia returned four records, three of which are incorrect, and the remaining record shows an outdated address. In the latter case, a temporal notation working as a metadata element

36

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

Fig. 1. Wikimapia returned four records in a search result for the Royal Women’s Hospital in Melbourne, three of which were incorrect, and the remaining record showed the incorrect position for the hospital.

would enable users to better understand the data available on Wikimapia. The problem in the above example is that the VGI data is incomplete and therefore is not authoritative and that it does not meet reasonable quality standards. However, making data lineage available to VGI users would facilitate better understanding of the data’s fitness. Having reliable metadata will better inform users of the data’s utility and relevance. The creation of metadata can, to some extent, foster better understanding and appreciation of data quality, although accurate measurement of VGI data quality could be difficult. This paper proposes an approach to facilitate the creation of metadata for VGI which will benefit populations which are increasing its reliance and usage of VGI. We suggest a method that provides metadata for VGI via VGI users – Geospatial Metadata 2.0. The remainder of the paper has been organised as follows: Section 2 discusses the importance of metadata for VGI and examines the efficiency of current methods of metadata creation if employed for VGI. Section 3 presents an argument on how crowdsourcing can be an effective way for creating metadata for VGI. Section 4 presents a conceptual design of Geospatial Metadata 2.0. Section 5 then presents and evaluates a prototype system implementing the concept. Sections 6 and 7 present discussions and conclusions.

2. Need for VGI metadata Investigations have been undertaken to introduce a systematic approach for the assessment of VGI (Girres & Touya, 2010; Haklay, 2010; Haklay, Basiouka, Antoniou, & Ather, 2010). These investigations are built on a framework (van Oort, 2005) that proposes a set of criteria for the evaluation of geospatial information including lineage, positional accuracy, attribute accuracy, logical consistency, completeness, semantic accuracy, usage and temporal quality. These analyses highlight the inconsistency of VGI in terms of quality. Quality of VGI is dependent on participants and their competence and level of care taken in preparing data. Haklay et al. (2010) review the number of volunteers and its impact on VGI

quality. Their research demonstrates that with an increase in the number of volunteers, positional accuracy increases. However, this is not a linear increase. It has been suggested that VGI should be treated as heterogeneous data and should be evaluated locally and not globally. Girres and Touya (2010) highlight the importance of having specifications to ensure the geospatial data quality for VGI. Their paper argues that while there are specifications for data preparation, they are rich and complex for the volunteers. They then argue for a balance between specification and freedom of contribution. Contrary to this, Coleman et al. (2010) show that the mainstream of VGI providers are infrequent and are typically interested amateurs who could fail to comply with any strict specification. The literature in this domain highlights two major issues with VGI. Firstly, inconsistency arises due to a varied range of volunteers and the different methods employed for data creation. Secondly, the lack of efficient specifications by which volunteers can use as a guide in creating reliable information. While some inconsistency is expected in VGI, an increase in the number of volunteers could potentially assist in solving the first issue. For the second issue, specifications can be put in place to facilitate the improvement of VGI. Data lineage within specifications would play a significant role in helping users assess suitability for a particular use. However, discovering data in VGI systems still poses problems. It is impossible to discover specific data unless users input the exact title or data feature in the map layer. We compared the search result for a number of examples in two other prominent VGI systems – OSM and Google Map Maker. This comparison shows how metadata could be critical in describing and discovering VGI content. In Google Map Maker, if you search for Melbourne Central, one the most known landmarks in Melbourne, you will a get an incorrect address with no metadata for the landmark. But if you add the word ‘Dome’ to the search, you will get the right address and relevant description (Fig. 2). In OSM, unless users input the exact phrase ‘Royal Women’s Hospital’ it is impossible to discover it. This shortcoming is primarily rooted in the limitations of OSM’s search engine which uses

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

37

Fig. 2. Search results for Melbourne Central in Google Map Maker with and without the additional word of ‘Dome’.

only data content in returning a search request. We looked at the content of Royal Women’s Hospital feature in the database of OSM. There are two features: hospital boundary and hospital building (Fig. 1). The database contains the name of the hospital and a note that indicates it is the women’s hospital. This means the search is limited to these contents and therefore the chance of discovery is limited if other search words are used. Google Maps’ search engine is superior and is able to suggest alternative search options, thus making it easier for the user to find the desired data/feature. If we look at the Google Map maker’s database we see enriched content that contains different descriptions with additional categories that help with faster data discovery (Fig. 3). This enriched content includes different descriptions of the hospital feature. In this paper, this enriched content is regarded as metadata which helps not only with data description but also with its discovery (see Fig. 4). One might argue, one potential solution to the data discovery problem is Linked Data (Berners-Lee, 2009) where we can use different sources from the Web such as OSM, Google Maps and Wikimapia to improve the user experience in accessing enriched content via the Linked Data method. However, despite the potential of Linked Data, (Hartig, 2009) argues that Linked Data should be accompanied by metadata in order to increase its utility for users. In this paper, we propose that Geospatial Metadata 2.0 is a highly viable solution and can perform an important function in facilitating VGI fitness for use. In the next section, we first look at the metadata definition to clarify some fundamental concepts. 2.1. What is metadata The Oxford Dictionary defines metadata as a set of data that describes and gives information about other data. Traditional geospatial metadata records based on standards such as ISO 11915,

FGDC and Dublin Core have successfully served SDIs for a long time. Traditional metadata records have been developed to serve geospatial data experts by those professionals who understand the complexity of geographic data. Experts and professionals usually create metadata (Mathes, 2004). In the same way, the foundations of the majority of geospatial data catalogues have been created by professionals. It is a common practice now to create metadata based on standards and guidelines of cataloguing, and classification of geospatial data (Green & Bosomair, 2001). While metadata created by experts are of high quality, the practice of metadata creation is expensive and extremely time consuming (Kalantari, Rajabifard, & Olfat, 2009, Olfat, Kalantari, Rajabifard, Senot, & Williamson, 2012). If employed, professionally created metadata will not be efficient for VGI as it will be difficult to sustain the metadata record for the large volume of VGI produced or updated by volunteers. At present, volunteers are increasingly becoming more experienced in using new technologies such as smart phones, the Global Navigation Satellite System (GNSS), satellite imagery and mapping applications and therefore are producing more geospatial data. This issue has also been acknowledged by (Ellul, Winer, Mooney, & Foord, 2012) which calls for the creation of flexible metadata content and systems. An alternative to professionally created metadata is author-created metadata where geospatial data providers create the metadata (Greenberg, Pattuelli, Parsia, & Robertson, 2001) (e.g. the Dublin Core Metadata). Volunteer-created metadata could possibly solve a large number of data problems we face when fully reliant on professionals. Volunteers who create data can also create metadata for what they have added or updated in open maps. This could be a mandatory requirement for metadata creation by volunteers. Alternatively, it could become arbitrary for them. For instance, Fig. 5 illustrates a snapshot of functionality in OSM that enables

38

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

Fig. 3. Royal Women’s Hospital feature content in OSM.

volunteers to provide additional attributes. This method provides an option for VGI authors to create metadata and describes their contribution in an unstructured form. Exploring the XML extraction of the feature selected in Fig. 5, the additional elements can be used as a foundation for compiling metadata for VGI (Fig. 6). The example shows how this additional attribute can potentially be used to create a metadata record based on standards. Here, we have added an element specified by ISO 19115, the Descriptive Keyword element: However, there is a fundamental issue here. Both professionally created and author-created VGI metadata have a similar disadvantage: users of VGI are not involved in the process. Metadata experts and authors may create metadata that do not necessarily address the needs of users. VGI users are therefore still disconnected from the process and their perception and experience of data remain ignored. The significant missing step of both professionally and author-created metadata is the ability of users to contribute towards VGI, without subscribing to the system and becoming a registered volunteer. An alternative approach which can be utilised to create metadata for VGI is through crowdsourcing. VGI users can use their own interpretation of a geographic phenomenon to describe it.

Furthermore, users can express their opinions on the description of data, its quality, fitness for purpose etc. (Kalantari, Olfat, & Rajabifard, 2010). This is in line with the (Bulterman, 2004) proposition to rehabilitate metadata, where he discussed that locating electronic resources is best done using the textual content rather than using catalogues. Although (Bulterman, 2004) suggests a moratorium on metadata, we argue that traditional catalogue-based metadata can be complemented with user-generated metadata which enriches data content. This method will facilitate user involvement in the process of improving VGI. If users can create metadata for VGI and share their notes with other users, it will help with VGI’s discoverability and add value to content. Through this approach, the collective intelligence of users will help improve the understanding of other VGI users. The next section analyses the potential of crowdsourcing for creating VGI metadata. 3. Why crowdsource VGI metadata? With the expansion of Web content, it is becoming more difficult to manage an ever-increasing volume of bookmarks in brows-

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

39

Fig. 4. Royal Women’s Hospital feature content in Google Map Maker.

Fig. 5. Assigning attributes to a feature in OSM.

< id="204763809" user="xxxxx" uid="435601" visible="true" version="2" changeset="15355547" timestamp="2013-03-13T19:44:20Z"> Fig. 6. Metadata for a feature in OSM.

ers. Websites such as ‘delicious’ enable users to create metadata about Web content by saving the URL address of websites, recommending an annotation, allocating tags to assist in subsequent access, and grouping similar URLs together (Fig. 7). There are also approaches that spatially organise Web content (Florczyk et al., 2012). Users are free to choose any tag and assign them any

number of times, allowing the URL to be linked to many perceptions at once. This is where we can explore the benefit of crowdsourcing VGI metadata in comparison with formal systems of describing content (Thomas, Caudle, & Schmitz, 2009). Shirky (2005) argues that the links made by tags are the only suitable way to organise resources on systems as large and disordered as the Web. In a similar manner, we believe crowdsourcing is a potential solution for the creation of metadata for VGI. We have three reasons which lay the foundation for this proposition. 3.1. VGI: a disordered system Traditional classification and description methods for geospatial data are always based on standards and rules. Standards and

40

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

Fig. 7. Tagging URLs and saving as bookmarks in delicious.

rules can only create a certain set of classification for datasets and features in VGI. For instance, one guideline in geographic information may classify the road feature together with the railway layer in a single classification as transportation. Another may separate them into two different classifications: road and public transport layers. Therefore, traditional methods will fall short in classifying and describing VGI from the users’ perspective as there is no single classification. We argue that the crowdsourcing approach can be more effective in classification, because here, the crowd does not have to make a decision and limit the data or feature to a guideline or standard they are not familiar with. The users instead bring their own language that is meaningful to them (Shirky, 2005). For instance, (Köbben, Huisman, & Lin, 2012) demonstrates a simple and accurate tagging process based on Digital Surface Models for users to tag photos in photo-sharing websites that increases the quality of website content. Similarly, VGI users can select different tags to describe the same item. For example, items related to scale may be tagged ‘500’, ‘1:500’, or ‘1/500’. VGI, a disordered system as such, should allow users to describe the data sets in a way which they find useful and relevant. 3.2. VGI: large and ever-growing With an increasing number of users, applications and digital geographic features being created, it is difficult to provide descriptions for every VGI. Sinha (2005) suggests that tags provided by the crowd work well for descriptions in a large resource like the Web as it reduces the risk of making decisions without perspective. According to Mathes (2004) and Shirky (2005), creation of metadata through tags works for data classification in a large resource by facilitating participation of users in describing content. By tagging features and data sets in VGI, users become the metadata authors and fill in for the custodians of VGI. VGI is a large resource that

should be supported by a user base that can contribute towards classifying and describing it. 3.3. VGI: a system with an unconventional authority Unlike SDIs which are designed for specific organisational needs, it is impossible to accurately predict how users search for features and data sets in VGI. In other words, it is not feasible to forecast how users, which are ever increasing, and applications might realise, recognise and utilise VGI. Golder and Huberman (2006) argue that criticism towards crowdsourcing is becoming less relevant as the traditional definitions of authority might need to evolve. Their research confirms that the majority of users have the best intentions, they tag their resources extensively, their terms are detailed, and in many cases authoritatively chosen. Tagging is becoming very popular and multi-user tagging form common vocabularies. Bringing together a group of taggers can create a collaborative tagging system and ultimately result in folksonomies. This collaborative tagging results in a set of descriptions and becomes metadata that are easily understood by users. In this space, Hollenstein and Purves (2010) have studied how users name city neighbourhoods. The study showed that even with errors in semantics and tagging, the quality of user-generated metadata (because of tagging) is high enough with adequate precision and accuracy to describe city neighbourhoods. The metadata created by tagging is not only about how objects are described, but also about adding information regarding the location. The outcomes of this research demonstrate the users’ overall attitude towards the creation of metadata meeting the basic requirements for describing an approximate boundary of a city’s suburbs. In a similar study, Rorissa (2010) has undertaken a comparative study of user-generated Flickr tags and formal index terms. His research suggests a professional indexing tool should follow the underlying

41

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

structure of metadata created by users. Users are now becoming the new authority for VGI. In another study, (Keßler & Groot, 2013) argues that trust is a key factor when dealing with VGI, a dimension which is implied in authoritative data. Moreover, studies of crowdsourced system demonstrate a considerable heterogeneity in crowd contribution. In other words, the method by which data is collected by the crowd is different from one individual to another, as opposed to the standard methodologies applied in collecting authoritative data (Haklay, 2014). These studies demonstrate there is different authority behind VGI. As such, its management should also be different. Therefore, VGI metadata, its description and its classification, should reflect the language, expectations and requirements of the new authority. We have described three reasons for crowdsourcing VGI metadata. The next section describes a conceptual design for the creation of Geospatial Metadata 2.0. 4. Geospatial Metadata 2.0: A conceptual design We believe a user-defined collection of tags and descriptions can facilitate metadata generation for VGI. In a VGI system where many users are allowed to tag geographic data, this collection of tags can become a geographic folksonomy, that is, a method which allows collaborative creation and management of VGI metadata. We propose two models for VGI metadata creation. In the first model, we create a database for metadata by only monitoring users’ interaction whilst they are unaware. In the second model, we explicitly allow users to create metadata. 4.1. Implicit model In this model, we monitor search terms used, analyse them, and then use them as descriptors to create content for VGI feature metadata. This implicit model is streamlined in three steps (Fig. 8): monitoring search words, recording search words and assigning search words. The VGI system consisting of geographic data typically provides users with a facility to find a feature, place or location. Users query data using search terms. The service will then find and retrieve corresponding records with specific search terms. Users will then be able to view the results and decide what data is relevant to their needs. An example is discussed here. The user is searching for ‘Blackburn High School in Victoria, Australia’ in Wikimapia using the search feature provided. In the first instance, results show no matching record for this search (Fig. 9). The first impression is that there is no Blackburn High School recorded in Wikimapia. A change in search terms used provides a different result. By deleting ‘Victoria’ from ‘Blackburn High School in Victoria, Australia’ the system is able to retrieve Blackburn High School from the records and present it on the map (Fig. 10). At first glance, ‘Victoria’ is a reasonably expected search word which should be tagged to the Blackburn High School. A way to address this issue would be to have an administrator assign the word ‘Victoria’ to the Blackburn High School record. However, with such a large number of geographic features on global maps, it would be an extremely costly and impractical initiative to have someone manually assign keywords and create metadata.

Search word

Metadata records retrieved

1

Yet the argument that this paper posits is whether ‘Victoria’ as a tag or metadata should be used to describe this geographic feature and if it needs to be considered from the users’ perceptive. A practical way to address the issues highlighted in this paper is to monitor and track search words used by users as collective intelligence when discovering geographic features. Here, we identify the search words used by users looking for features, places or locations and monitor them during the discovery step. We record any search word relevant to a geographic feature identified in a database and form the basis for VGI metadata records. Here, we discover how many times a search word can find a geographic data set. If a search word is used frequently in discovering a feature, it should be regarded as a keyword that is of significance for the users. Increasing use of the same search word illustrates its usefulness. Any frequently utilised search word could be added to the metadata for the VGI feature and its datasets. Recorded search words which are frequently utilised will be assigned to the geographic feature and will and stored as metadata. Through this method, the commonly used search words (that are semantically linked to features) for finding features are recorded and made available to other users. This method will, over a period of time, create a database of keywords for metadata records related to geographic features through applying and refining appropriate search words (Fig. 10). 4.2. Explicit model The explicit model creates the VGI metadata content directly through comments made by users, as opposed to the implicit model that indirectly creates metadata via monitoring the search words used by user. In the explicit model, users tag a feature based on their knowledge and understanding. The users label the geographic information based on their awareness in the context of using that information which is usually related to their individual requirements. These tags can be illustrated in a ‘Tag Cloud’. Within the tag cloud, the tags which are used most frequently by users, will be highlighted and shown in a larger-sized font (Fig. 11). Users also become implicit moderators of existing tags in the cloud by their choice of tags. In the next section, we build on this by demonstrating a prototype system. 5. Prototype system and evaluation To demonstrate the idea of Geospatial Metadata 2.0, we have implemented and tested a prototype system based on the conceptual design presented in the previous section. In this prototype system, we have implemented implicit and explicit models of Geospatial Metadata 2.0 creation as add-ons in Geonetwork. We were unable to choose an operational VGI system environment for the implementation of Geospatial Metadata 2.0 as it is a new concept that has yet to be tested and could have interfered with an existing operational system. We therefore chose to use Geonetwork. Geonetwork is an open-source geospatial cataloguing system (Rajabifard & Kalantari, 2009). Geonetwork enabled us to mimic scenarios which we conceptualise in the previous sections. In the prototype system, we have taken a scenario that a VGI user would typically find himself in. The user logs into the system and

Records viewed by users

2

Search word linked to the records

3

Fig. 8. Geospatial Metadata 2.0 creation process based on the implicit model.

Metadata created

42

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

Fig. 9. User searching for Blackburn High School in Victoria, Australia in Wikimapia.

Fig. 10. A different search result by deleting a relevant keyword.

begins searching using the standard ‘key word’ search facility (see top-left corner in Fig. 13). As user interaction begins, there are three add-ons which enable the system to create Geospatial Metadata 2.0 by compiling folksonomies and sharing user experience on data search and discovery. In this section, we describe each add-on and its design, and present the assessment result for the prototype. For the assessment, we recorded the functionalities of the prototype as video clips and provided them in an online survey with a set of statements for each add-on as users have only limited access to the prototype system. The video clips demonstrated how users search for data. They also showed the impact of user interaction with the system. The participants were able to experience the add-ons and view how they operate. Participants were also exposed to changes in the metadata database. For instance, they could see changes in the Tag Cloud as a result of data download or agree/disagree add-ons. The participants were then asked to rate each statement about the add-ons (Fig. 12). They were also invited to express their

opinions on the add-on specifically, and on the prototype as a whole. The assessment benefits from the input of experts from 26 organisations from Australia, United States, Italy, Indonesia, Pakistan, Spain, and Tanzania. The assessment criteria targeted the functionality of the add-ons. The functionality here relates to the effectiveness of the add-on in creating metadata and helping with data discovery. In Sections 5.1, 5.2 and 5.3 we describe the Geospatial Metadata 2.0 add-ons and in Section 5.4, we present the assessment result. 5.1. Suggestion list as an implicit model In the first add-on, every search query by users is recorded and is provided to the next search user as a suggested list of search words. This differs from Google Chrome’s or Internet Explorer’s drop-down search options that are based on the classification of resources found through the search engine. In this add-on, subsequent searches benefit from previous searches in the

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

43

Victoria

Blackburn School

High

Melbourne

Fig. 11. A demonstrator Tag Cloud representing metadata for Blackburn High School.

Fig. 12. A sample question and the rating system of the online survey.

user-generated context (Fig. 13). In this approach, search words (and their frequencies) that previous users search for are provided to the current user. This is particularly useful for users without specific knowledge on what they wish to search for.

with common descriptive tags being used in far greater proportion to the varied or personally oriented tags.

5.2. Tag Cloud as an implicit model

The third add-on goes even deeper and through this process, the users tag a data set in a way they think is most appropriate (Fig. 15). Users are asked to provide a new description for data they have discovered. This is not a mandatory step; it is an optional step for users to provide an additional descriptor. This tag might be a new description for the data set, or the user has the option to select from previous tags provided by other users. Accordingly, user knowledge on geospatial data sets will be recorded and made available for other users. In addition, tags provided by previous users will also be validated. The tags will help users find datasets quickly and user-generated tags will also be provided in the Tag Cloud. In the third add-on, not only can users add their description of a particular data set, but they can also agree or disagree with tags and descriptions provided by previous users (Fig. 16). An agreement/disagreement will increase/decrease the weight of the search word and change its appearance in the tag cloud.

The second add-on is the creation of a ‘Tag Cloud’ of the previous search words by bringing them together and presenting them to facilitate discovery of data (Fig. 14). In this add-on, each search word will be assigned a weight, based on the level of users’ interaction and satisfaction with the discovered data set. For instance, if after finding data, the user explores metadata records by clicking on the details button, the add-on will assign a weight of 0.25 to this search word. This is because the system tracks an interest from the user for the further exploration of the data set. In addition, if the user downloads the data, the add-on assigns a weight of 1.0 to the search word promoting the fitness of the search word for discovering the data set. Therefore, each search word will be weighted based on its usability for discovering relevant datasets. The ‘Tag Cloud’ collects search words, visually presents the relative importance of these tags to other users, and shares collective knowledge among users. This creates a user-generated taxonomy. This example illustrates that as the number of users increase, each dataset develops a ‘tag cloud’ or a cluster of tags describing it. And in turn, these descriptions influence other users in their choice of use. Golder and Huberman (2006) also confirm that the most popular tags do present an accurate illustration of the data set or feature,

5.3. New descriptors as an explicit model

5.4. Prototype evaluation As mentioned above, in order to evaluate the prototype system, we conducted an online survey which involved individuals from 26 organisations from various countries that are involved in spatial

44

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

Fig. 13. Subsequent searches benefit from previous searches using the drop-down menu.

data production and management. Rather than comparing the individual add-ons which have distinct functionalities, we assessed the prototype system as a whole. We asked participants to assess the prototype by answering a list of questions and then providing comments about the system. The former has a quantitative outcome and the latter presents a qualitative assessment. 5.4.1. Quantitative analysis On the front of prototype functionality, we asked participants to comment on the effectiveness and usefulness of the add-ons in the process of data discovery, creating new metadata content and enriching the content of the current records. Twenty-seven percent of participants strongly agreed and 61% agreed that the add-ons have the potential to improve the content of metadata keyword and data discovery. Eight percent had a neutral opinion and four percent disagreed with this expected role (Fig. 17). Overall, a high level of approval and agreement confirmed the effectiveness of Geospatial Metadata 2.0 concept. In addition, we also asked if the participants would implement such a concept in their organisations. Eight percent strongly agreed and 77% agreed that Geospatial Metadata 2.0 would be valuable for their organisations. Eight percent presented no specific opinion; and seven percent disagreed (Fig. 18). This indicates that the idea of Geospatial Metadata 2.0 can be realistically implemented in organisations dealing with spatial data. We also asked about the weighting system based on search words. Four percent strongly agreed and 73% agreed with the weighting system’s logic with 15% were neutral and 8% disagreed. 5.4.2. Qualitative analysis A qualitative analysis of the participants’ comments provides an interesting insight into the Geospatial Metadata 2.0 concept. With

regards to grouping and weighting search words, participants raised the issue of search context and its application. For instance, data on water bodies can be used by both environmental scientists and urban planners. The former might be interested in sampling, flow and other hydrological attributes, while the latter could be interested in viewshed and proximity to services. It was suggested that user interest and purpose while searching and categorising and weighting the search words should be recognised. Participants also commented that data download by users is a positive indication that the search word is useful in discovering data. However, the participants did not believe that the action of expanding metadata records necessarily meant that the user looks for such specific data. It was suggested that prescribing weight to this action may create more noise to the results instead of making it more accurate. Participants suggested instead that the time spent on metadata details to be acknowledged in the weighting as it can indicate greater relevance between search word and its related dataset. Participants also suggested that a mechanism to control same user behaviour for agreeing/disagreeing to be put in place so as to avoid more than multiple input by the same user. As it is apparent in the evaluation, the concept of Geospatial Metadata 2.0 is promising. This method however does not result in the creation of conventional metadata records based on current standards. In this metadata format, users collectively create useful sets of subject descriptors (metadata) in the form of tags for features and datasets. Metadata is derived from VGI users interacting and adding value to the information. We term this metadata for VGI as Geospatial Metadata 2.0. While the above prototype demonstrates a collective metadata creation process in a data catalogue system such as Geonetwork, its principle can be applied in any VGI system. A group of users of VGI can tag a particular geographic data set or feature and create

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

Fig. 14. Folksonomy presented in a Tag Cloud.

Fig. 15. The users tagging the dataset or validating previous tags.

45

46

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

Fig. 16. User assessing other users’ conception of data.

Fig. 17. The participants’ agreement on the effectiveness and usefulness of the addons.

Fig. 18. The participants’ willingness to implement Geospatial Metadata 2.0.

a descriptive record for the subject being tagged. Prospective users can use this description to examine fitness for the purpose by relying on other user experience.

6. Discussion on Geospatial Metadata 2.0 Through the prototype evaluation, we demonstrated that 78% of the assessors confirmed that authors and users of VGI will both potentially benefit from the use of crowdsourcing to create metadata for VGI. While there is agreement that Geospatial Metadata

2.0 is by and large a promising approach, the qualitative analysis suggests there are areas for improvement. The assessment reveals that in the Suggestion List and Tag Cloud add-ons (implicit models), more consideration should be given to search context and application. The assessment also suggests that for the New Descriptor add-on (explicit model) user interest and purpose should be taken into account. Hence, as metadata which is created by either the implicit or explicit model only uses a user-centric approach, they may only shed light on the perception of users and their level of knowledge of the geographical features they search for. We however argue that larger scale user contributions can address some of the problems of metadata created in both implicit and explicit models, as users of a crowdsourced based system tend to notice the current use of ‘terms’ within these systems, and thus are encouraged to use existing terms in order to easily form connections to related items. There are existing methods in both implicit and explicit models that can be utilised to deal with the issues highlighted above. For example, Gahegan, Luo, Weaver, Pike, and Banchuen (2009) discusses a means to create metadata for geospatial information resources through ontological structures. In this method, user contributions of geospatial resources are complemented by using a system-centric approach where the semantics and ontologies are used to link search context to data content. For user interest, Angeletou et al. (2011) present an approach to describe an online resource by labelling its users and monitoring their role based on the behaviour they exhibit. The important qualification for this approach is the need for users to become registered in the system, as unregistered users cannot be monitored. While the system benefits from a user-centric approach, it is restricted to a reliance on those who consent to an online identity. Results from the assessment also suggests more consideration to be given to the time spent on metadata details and a mechanism to be introduced to control same user behaviour. Time and identification functionalities can be built into the system by embedding the relevant interactivity measures such as browsing duration and Internet Protocol (IP) in the system. In the context of our prototype system, the explicit model relies heavily on user willingness to contribute. On the other hand, the implicit model relies on computer-run algorithms. Both implicit

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

and explicit models are complementary and they are both critical for the success of the method proposed in this paper. 7. Conclusion This paper discussed the issue of metadata in VGI and the associated difficulties of data discovery in VGI. The paper then critiqued existing approaches which rely primarily on data custodians/collectors. It then argued that tagging and folksonomy which is built on VGI user interactions within a system can lay the groundwork for VGI metadata. The method proposed here for creating VGI metadata builds on crowdsourcing, mainly relying on users to describe VGI. As users create and add on to VGI, they can progressively provide more descriptions which can then constitute metadata. Building on these concepts, the paper introduced Geospatial Metadata 2.0, an approach to create metadata automatically for VGI through implicit and explicit user interaction. This approach allows users to add on to VGI with or without registration. A conceptual design and implementation for this approach were then presented. The prototype was tested with the involvement of metadata experts. The results conclude that this approach has the potential to address the lack of VGI metadata. Currently VGI systems such as Wikimapia allow for editing and tagging without requiring their users to be registered. The proposed approach can be built into these systems, so Geospatial Metadata 2.0 can be realised. In addition to VGI, authoritative sources of geospatial data could also benefit from Geospatial Metadata 2.0 by comparing crowdsourced data against what data administrators would consider accurate and complete metadata for features and datasets. Poor understanding of how geospatial metadata is created, a lack of knowledge in using metadata by geospatial data users and limited benefits of current metadata systems for users suggest a need for a new approach in dealing with metadata as it is currently an unrecognised but critically important component of geospatial information management. This is rooted in inconsistent metadata inclusion, poor definition of content, lack of efficiency for users and more importantly, minimal use by the search engines. While we demonstrated the use of Geospatial Metadata 2.0 for VGI, the concept can also be applied to non-VGI geospatial data. The metadata for authoritative geospatial datasets can be tagged, criticised or further described as users utilise them. Such crowdsourced content can be regarded as an evolutionary metadata system. Metadata content which is created by users will become meaningful to subsequent users. Geospatial metadata 2.0 could improve current dataset-centric approaches and provide a mechanism to create feature-centric metadata. Future research can delve further into Geospatial Metadata 2.0 from three perspectives. Firstly, how geospatial data descriptions can be better enriched through users from a variety of backgrounds with different intentions for data usage. Secondly, how geospatial data can be better organised using Geospatial Metadata 2.0 so data can be discovered through non-geospatial search engines. Third and finally, how Geospatial Metadata 2.0 can be used to enrich current geospatial datasets records with metadata on individual features within datasets. Geospatial Metadata 2.0 has the potential to revolutionise geospatial data management in the domain of SDI. Acknowledgements The authors wish to acknowledge the support of the Australian Research Council for funding this research and the members of the Centre for Spatial Data Infrastructures and Land Administration at the Department of Infrastructure Engineering, the University of

47

Melbourne, in the preparation of this article and the associated research. We also acknowledge feedback and comments from Professor Michael Goodchild in titling this paper. However, the views expressed in the article are those of the authors and do not necessarily reflect those of these groups. References Angeletou, S., Rowe, M., & Alani, H. (2011). Modelling and analysis of user behaviour in online communities the semantic web – ISWC 2011. In L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy, & E. Blomqvist (Eds.) (pp. 35–50). Springer Berlin/Heidelberg. Ballatore, A., & Bertolotto, M. (2011). Semantically enriching VGI in support of implicit feedback analysis. In Proceedings of the 10th international conference on Web and wireless geographical information systems (pp. 78–93). Kyoto, Japan: Springer-Verlag. Beard, K. (2012). A semantic web based gazetteer model for VGI. In Proceedings of the 1st ACM SIGSPATIAL international workshop on crowdsourced and volunteered geographic information (pp. 54–61). Redondo Beach, California: ACM. Berners-Lee, T. (2009). Linked data-the story so far. International Journal on Semantic Web and Information Systems, 5, 1–22. Bulterman, D. C. A. (2004). Is it time for a moratorium on metadata? IEEE Multimedia, 11, 10–17. Coleman, D. J., Geogiadou, Y., & Labonte, J. (2009). Volunteered geographic information: The nature and motivation of producers. International Journal of Spatial Data Infrastructure Research, 4, 332–358. Coleman, D. J., Sabone, B., & Nkhwanana, N. (2010). Volunteering geographic information to authoritative databases: Linking contributor motivations to program effectiveness. Geomatica, 64, 383–396. Corcoran, P., Mooney, P. & Bertolotto, M., 2013. Analysing the growth of OpenStreetMap networks. Spatial Statistics. Ellul, C., Winer, D., Mooney, J. & Foord, J., 2012. Bridging the gap between traditional metadata and the requirements of an academic SDI for interdisciplinary research. eds. GSDI 14, Spatially Enabling Government, Industry and Citizens Addis Ababa. Florczyk, A., López-Pellicer, Fj., Nogueras-Iso, J., & Zarazaga-Soria, Fj. (2012). Automatic generation of geospatial metadata for web resources. International Journal of Spatial Data Infrastructures Research, 7, 151–172. Gahegan, M., Luo, J., Weaver, S. D., Pike, W., & Banchuen, T. (2009). Connecting GEON: Making sense of the myriad resources, researchers and concepts that comprise a geoscience cyberinfrastructure. Computer Geoscience, 35, 836–854. Girres, J.-F., & Touya, G. (2010). Quality assessment of the French OpenStreetMap dataset. Transactions in GIS, 14, 435–459. Golder, S. A., & Huberman, B. A. (2006). Usage patterns of collaborative tagging systems. Journal of Information Science, 32, 198–208. Goodchild, M. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal, 69, 211–221. Green, D., & Bosomair, T. (2001). Online GIS and spatial metadata. New York: Francis and Taylor. Greenberg, J., Pattuelli, Maria Cristina, Parsia, B., & Robertson, W. D. (2001). Authorgenerated Dublin core metadata for web resources: A baseline study in an organization. Digital Information, 2, 78. Haklay, M. (2010). How good is volunteered geographical information? Aÿcomparative study of OpenStreetMap and Ordnance Survey datasets. Environment and Planning B: Planning and Design, 37, 682–703. Haklay, M., 2014. Assertions on crowdsourced geographic information & citizen science, [online] Available at: [Accessed: 30.03.14]. [online]. [Accessed Access Date]. Haklay, M., Basiouka, S., Antoniou, V., & Ather, A. (2010). How many volunteers does it take to map an area well? The validity of linus law to volunteered geographic information. The Cartographic Journal, 315. Hartig, O., 2009. Provenance Information in the Web of Data. eds. LDOW. Hollenstein, L., & Purves, R. (2010). Exploring place through user-generated content: Using Flickr tags to describe city cores. JOSIS, 1, 21–48. Kalantari, M., Rajabifard, A. & Olfat, H., 2009. Spatial metadata automationed. eds. Surveying and Spatial Sciences Institute Biennial International Conference, Adelaide, South Australia. Kalantari, M., Olfat, H., & Rajabifard, A. (2010). Automatic spatial metadata enrichment: Reducing metadata creation burden through spatial folksonomies. In A. Rajabifard, J. Crompvoets, & M. Kalantari (Eds.), Spatially enabling society, research, emerging trends and critical assessment (pp. 248). Leuven: Leuven University Press. Keßler, C., & Groot, R. (2013). Trust as a proxy measure for the quality of volunteered geographic information in the case of OpenStreetMap. In D. Vandenbroucke, B. Bucher, & J. Crompvoets (Eds.), Geographic information science at the heart of Europe (pp. 21–37). Springer International Publishing. Köbben, B., Huisman, O., & Lin, H. (2012). Combining VGI with viewsheds for photo tag suggestion. In G. Gartner & F. Ortag (Eds.), Advances in location-based services (pp. 181–190). Berlin Heidelberg: Springer. Mathes, A., 2004. Folksonomies – Cooperative Classification and Communication

48

M. Kalantari et al. / Computers, Environment and Urban Systems 48 (2014) 35–48

Through Shared Metadata [online]. . [Accessed Access Date]. Mooney, P., & Corcoran, P. (2012). The annotation process in OpenStreetMap. Transactions in GIS, 16, 561–579. Olfat, H., Kalantari, M., Rajabifard, A., Senot, H., & Williamson, I. P. (2012). A GMLbased approach to automate spatial metadata updating. International Journal of Geographical Information Science, 27, 231–250. O’Reilly, T., 2007. What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. Communications & Strategies, No. 1, p. 17, First Quarter 2007. Rajabifard, A., Kalantari, M. & Binns, A., 2009. SDI and metadata entry and updating tools. In B. Van Loenen, J. W. J. Besemer & J. A. Zevenbergen (Eds.), SDI

convergence. Research, emerging trends, and critical assessment delft (pp. 121– 136). Rorissa, A. (2010). A comparative study of Flickr tags and index terms in a general image collection. JASIST, 61, 2230–2241. Shirky, C., 2005. Ontology is overrated: categories, links, and tags [online]. [Accessed Access Date]. Sinha, R., 2005. A cognitive analysis of tagging [online]. [Accessed Access Date]. Thomas, M., Caudle, D. M., & Schmitz, C. M. (2009). To tag or not to tag? Library Hi Tech, 27, 411–434. Van Oort, P., 2005. Spatial data quality: from description to application: NCG, Nederlandse Commissie voor Geodesie.