Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 72 (2015) 1 – 4
The Third Information Systems International Conference
Integrating Streaming Data into Semantic Mashups A Min Tjoaa, Peter Wetza, Elmar Kieslinga, Tuan-Dat Trinha, Ba-Lam Doa a
TU Wien, Favoritenstraße 9-11/188, Vienna, 1040, Austria
Abstract Streaming data, i.e., frequently changing and potentially infinite data flows, are becoming more and more prevalent in today’s interconnected world. In a Smart City context, for instance, citizens can access real-time information about a city’s weather, pollution, or traffic conditions. However, turning vast real-time data streams into actionable knowledge poses significant challenges. For instance, heterogeneous data from varied sources needs to be normalized and integrated, ideally using a common structured vocabulary. Furthermore, to provide users with both real-time and aggregate views on the data, data stream access requires support for temporal operators. Finally, it is difficult for nontechnical users to integrate and leverage streaming data. To tackle these challenges, we propose an extension of the Linked Widgets concept for the streaming data domain. In this paper, we outline a platform prototype for the creation of streaming data mashups that is currently under development and illustrate its potential by means of a use case based on city real time bike data from the city of Vienna. © by Elsevier B.V. This is anSelection open accessand/or article under the CC BY-NC-ND license © 2015 2015Published Published by Elsevier Ltd. peer-review under responsibility of the scientific (http://creativecommons.org/licenses/by-nc-nd/4.0/). of The Third Information Systems International Conference (ISICO 2015) committee Peer-review under responsibility of organizing committee of Information Systems International Conference (ISICO2015)
Keywords: stream processing, semantic mashups
1. Introduction Against the background of broad developments such as the Internet of Things, Smart Cities, and Smart Devices, data processing requirements increasingly expand from a static and persistent towards a continuous streaming paradigm. In particular, novel applications foster the need for efficient processing of and reasoning over dynamic data. In various domains, making sense of frequently changing data flows to draw timely conclusions about the state of a system is crucial. This requires proper means to consider data’s temporal dimension. Examples include data fusion in a smart city context, environmental monitoring, public transport, and health care. The Semantic Web provides powerful means to integrate data from heterogeneous sources, but has traditionally not considered frequently changing data which arrives in a streaming fashion. Recently,
1877-0509 © 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of Information Systems International Conference (ISICO2015) doi:10.1016/j.procs.2015.12.098
2
A. Min Tjoa et al. / Procedia Computer Science 72 (2015) 1 – 4
several efforts such as C-SPARQL [1], CQELS [2], and SPARQLStream [3] were made to enable querying and reasoning over semantic data streams. Generally, these approaches extend the SPARQL query language by adding so-called window operators in order to separate an unbounded and potentially infinite data stream into queryable windows. Despite these promising developments, semantic data streams are still largely out of reach from an end user’s perspective. How can end users utilize and process such streams? How can they obtain actionable insights based on streaming data? To make Linked Data easily accessible, integrable and processable for end users, we previously introduced the concept of Linked Widgets [4, 5]. This approach allows users to create mashups for the dynamic integration of heterogeneous data sources. To this end, we created a platform that allows users to create mashups composed of data, processing, and visualization widgets. Moreover, widgets can be categorized into local widgets or server widgets based on where they are being executed. Typically, server widgets can be used for longer processing or monitoring tasks, wheras local widgets are more useful for small data transformation tasks. Advanced features such as terminal matching and automatic mashup composition make use of semantic web techniques and ultimately facilitate mashup composition for end users [6]. In this keynote paper, we discuss recent work towards extending this concept to include streaming data, which will allow users to create real-time mashups on the Linked Widgets platform. This concept is based on the idea of representing the whole data processing pipeline and supporting the user in creating semantic data streams, building end user mashups, and visualizing the processed data flows to get realtime insights. We illustrate the potential of this approach by means of a real world city bike use case. 2. Architecture In order to implement streaming data mashups, we need to extend the platform’s architecture to support C-SPARQL continuous queries. Domain Ontology
SSN Ontology
Data Cube Ontology
CSV
JSON
data
Triplification
triples
C-SPARQL Server
CQ results
Mashup Server
XML
Streaming Widgets
mapping extended RML Mapping
Fig. 1. Streaming Widgets Architecture After triplification, the data is streamed to the C-SPARQL server where continuous queries (CQ) are registered. These queries are evaluated constantly and send their results to an HTTP endpoint. Streaming widgets can then retrieve the data via this endpoint and process and visualize it accordingly.
A. Min Tjoa et al. / Procedia Computer Science 72 (2015) 1 – 4
3. Use Case We evaluated a prototypical implementation of the presented architecture with a real-time city bike data use case from the city of Vienna. A set of continuous queries provide widgets with a real-time and aggregated views on the stream. These widgets output data about a city bike station’s available bikes and available boxes as can be seen in Figure 2. Additional widgets such as the Station Filter or Map Viewer can be used to filter the data by station or visualize it on a map, respectively.
Fig. 2. City Bike Mashup Use Case 4. Future Work It will be interesting to evaluate performance metrics of the architecture such as query execution time as a function of triples on a stream, latency, memory consumption, and CPU consumption. Furthermore, we plan to integrate a varied range of real-time data sources such as weather information, air quality data, and traffic data to facilitate more sophisticated use cases. This will make it possible to create mashups that provide an integrated real-time perspective on previously isolated data sets, bringing together both static and streaming information. This will ultimately allow citizens to make well-informed decisions in the context of a smart city.
3
4
A. Min Tjoa et al. / Procedia Computer Science 72 (2015) 1 – 4
References [1] [2] [3] [4] [5] [6] [7]
D. F. Barbieri, D. Braga, S. Ceri, E. D. Valle, and M. Grossniklaus, “Querying RDF Streams with C-SPARQL,” SIGMOD Rec, vol. 39, no. 1, 2010, pp. 20–26. D. Le-Phuoc, M. Dao-Tran, J. X. Parreira, and M. Hauswirth, “A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data,” in Proceedings of the 10th International Conference on The Semantic Web - Volume Part I, Berlin, Heidelberg, 2011, pp. 370–388. J.-P. Calbimonte, O. Corcho, and A. J. G. Gray, “Enabling Ontology-based Access to Streaming Data Sources,” in Proceedings of the 9th ISWC, Berlin, Heidelberg, 2010, pp. 96–111. A. Anjomshoaa, E. Kiesling, D. Tuan, D. Lam, P. Wetz, and A M. Tjoa, “Leveraging the web of data via linked widgets,” J. Serv. Sci. Res., vol. 6, no. 1, 2014, pp. 7–27. T.-D. Trinh, P. Wetz, B.-L. Do, A. Anjomshoaa, E. Kiesling, and A M. Tjoa, “Linked Widgets Platform: Lowering the Barrier for Open Data Exploration,” in The Semantic Web: ESWC 2014 Satellite Events, vol. 8798, V. Presutti, E. Blomqvist, R. Troncy, H. Sack, I. Papadakis, and A. Tordai, Eds. Springer International Publishing, 2014, pp. 171–182. T.-D. Trinh, P. Wetz, B.-L. Do, E. Kiesling, and A M. Tjoa, “Distributed mashups: a collaborative approach to data integration,” Int. J. Web Inf. Syst., vol. 11, no. 3, 2015, pp. 370–396. P. Wetz, T.-D. Trinh, B.-L. Do, A. Anjomshoaa, E. Kiesling, and A M. Tjoa, “Towards an Environmental Decision-Making System: A Vocabulary to Enrich Stream Data,” Advances and New Trends in Environm. and Energy Informatics, 2016, p. 19.