Journal Pre-proof Enhancing the functionality of augmented reality using deep learning, semantic web and knowledge graphs: A review Georgios Lampropoulos, Euclid Keramopoulos, Konstantinos Diamantaras
PII: DOI: Reference:
S2468-502X(20)30001-2 https://doi.org/10.1016/j.visinf.2020.01.001 VISINF 62
To appear in:
Visual Informatics
Received date : 25 October 2019 Revised date : 2 January 2020 Accepted date : 7 January 2020 Please cite this article as: G. Lampropoulos, E. Keramopoulos and K. Diamantaras, Enhancing the functionality of augmented reality using deep learning, semantic web and knowledge graphs: A review. Visual Informatics (2020), doi: https://doi.org/10.1016/j.visinf.2020.01.001. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Zhejiang University and Zhejiang University Press. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-ncnd/4.0/).
Journal Pre-proof
pro of
Enhancing the functionality of augmented reality using deep learning, semantic web and knowledge graphs: a review Georgios Lampropoulosa,∗, Euclid Keramopoulosa , Konstantinos Diamantarasa a Department
of Information and Electronic Engineering, International Hellenic University, Thessaloniki, Greece
Abstract
urn a
lP
re-
The growth rates of today’s societies and the rapid advances in technology have led to the need for access to dynamic, adaptive and personalized information in real time. Augmented reality provides prompt access to rapidly flowing information which becomes meaningful and “alive” as it is embedded in the appropriate spatial and time framework. Augmented reality provides new ways for users to interact with both the physical and digital world in real time. Furthermore, the digitization of everyday life has led to an exponential increase of data volume and consequently, not only have new requirements and challenges been created but also new opportunities and potentials have arisen. Knowledge graphs and semantic web technologies exploit the data increase and web content representation to provide semantically interconnected and interrelated information, while deep learning technology offers novel solutions and applications in various domains. The aim of this study is to present how augmented reality functions and services can be enhanced when integrating deep learning, semantic web and knowledge graphs and to showcase the potentials their combination can provide in developing contemporary, user-friendly and usercentered intelligent applications. Particularly, we briefly describe the concept of augmented reality and mixed reality and present deep learning, semantic web and knowledge graphs technologies. Moreover, based on our literature review, we present and analyze related studies regarding the development of augmented reality applications and systems that utilize these technologies. Finally, after discussing how the integration of deep learning, semantic web and knowledge graphs into augmented reality enhances the quality of experience and quality of service of augmented reality applications to facilitate and improve users’ everyday life, conclusions and suggestions for future research and studies are given.
Jo
Keywords: Augmented reality, Machine learning, Deep learning, Semantic web, Knowledge graph, Human computer interaction ∗ Corresponding
author Email address:
[email protected] (Georgios Lampropoulos)
Preprint submitted to Journal of Visual Informatics
January 2, 2020
Journal Pre-proof
1. Introduction
Jo
urn a
lP
re-
pro of
The advent of the information era and the digitalization of everyday life, by adopting smart devices and advanced technologies (e.g. internet of things (IoT), artificial intelligence, social networks etc.), have resulted in the creation of an enormous volume of heterogeneous data and digital content, the increase of data sources and the diversity of data types, forms and structures. Moreover, the rapid growth of modern societies and the novel advances in technology have led to the emergence of users’ requirements for access to rapidly flowing information in real time. With the advancement of technology, the processing power and storage capabilities of devices have increased significantly. These smart devices are able to interconnect, communicate and interact over the Internet and are equipped with different types of sensors and actuators [1]. As a result, computer systems and smart devices are capable of retrieving, storing, processing and displaying large volumes of heterogeneous data rapidly while requiring minimal storage space and computational power. Due to this fact, real-time digital representation of information has become feasible, creating, thus, a more powerful way of environment modification, interaction and augmentation. Taking advantage of these technological developments and the exponentially increased data volume, Augmented Reality (AR) technology attempts to meet the above mentioned requirements by providing real-time access to the rapidly flowing information, not just quickly but mainly at the right time and in the corresponding space. Simultaneously, AR filters the information and displays only the required data in an interactive and user-friendly way so as to avoid information overloading. Through AR, information becomes “alive” and meaningful as it is accordingly embedded in the appropriate spatial and time framework [2]. Thus, it provides new ways for humans to interact with both the physical and digital world in real time. One of the main advantages of AR technology is its ability to be utilized in conjunction with other innovative technologies and exploit their individual potentials and properties. More specifically, through this combination, AR functionality as well as performance can be enriched and enhanced and optimal results can be attained. Deep learning and semantic web technologies constitute two of the most significant technologies which can reinforce AR applications and experiences. Deep learning can instill intelligence into AR systems and can be used as a means to improving computer vision. Semantic web can provide semantic interconnected information which is more easily processed and understood by machines, improving thus the overall information retrieval process. Knowledge graphs are directly connected with semantic web as it can acquire and integrate information into ontologies and apply a reasoner to derive new knowledge enhancing and reinforcing thus the functionality of AR applications. The aim of this study is to present the concept of these novel technologies in line with the potentials brought about by their combination, having as a result 2
Journal Pre-proof
pro of
the development of contemporary, user-friendly and user-centered intelligent applications. In this study, we describe both AR and mixed reality concepts (Section 2). We present the concept of deep learning technology (Section 3), as well as knowledge graphs and semantic web technology briefly (Section 4). Based on our literature review, we present and analyze case studies which developed innovative applications and systems through the use of AR in combination with deep learning and/or semantic web technologies (Section 5). Finally, after discussing the benefits and advantages that the integration of deep learning, semantic web and knowledge graphs into AR can yield (Section 6), conclusions and suggestions for future research and studies are given (Section 7). 2. Augmented Reality
Jo
urn a
lP
re-
In recent years, industries, enterprises, governmental organizations and academic community have shown keen interest in AR thanks to the value and the future potentials it promises to offer. The definition of this innovative technology varies [3], as some researchers focused on the technological means and tools used to create AR environments [4, 5], while some others focused on the characteristics of these environments [2, 6, 7, 8]. Moreover, Wu et al. (2013) pointed out that given the rapid development of technologies and technological systems which AR applications exploit, it would be inappropriate to limit the definition to specific technologies [3]. The term AR refers to technological applications of computer units which enrich and enhance users’ physical environment with additional virtual objects [9]. AR incorporates digital data (e.g. information, images, sounds, videos, interactive objects, etc.) in the real world, as perceived by the users through their senses, creating, thus, a mixed reality in which both real and virtual objects co-exist [2, 6, 8, 10]. In contrast to virtual reality (VR) which fully immerses users in virtual environments, AR allows users to interact with both the virtual and the real world in a seamless way [11]. Azuma (1997) provided a commonly accepted definition according to which AR is described as a technology which is interactive in real time, combines real with virtual objects and registers them in the real world [12]. Another definition, which emphasizes the technological means, determines AR as the technology which exploiting the capabilities of desktop and mobile computing systems allows users to see and interact with digitally generated objects which are projected into the physical environment [4]. In each case, the main AR features according to Azuma et al. (2001), are: i) the potential of interaction between and among users, real and virtual objects and ii) the combination and harmonization of real and virtual objects within the physical environment [13]. These characteristics allow spatial and time correlation of information and display it in real time within the physical world as a three dimensional overlay. Moreover, based on these characteristics, the basic requirements of an AR system can be defined. That is, a computer system that can respond to users’ inputs and generate
3
Journal Pre-proof
urn a
lP
re-
pro of
relative graphics in real time, a display capable of combining real and virtual images and a tracking system that can spot users’ viewpoint position [14]. Hence, AR is considered to be a modern technology that allows users to actively interact with virtual objects, which, however, co-exist with real-world objects in real time[11]. It aims at enhancing users’ perception and interaction with the physical world and at facilitating and simplifying the activities of their everyday life by providing them with virtual information, cues and objects about their immediate surroundings or indirect environment that they are not able to detect directly through their senses reinforcing, thus, their sense of reality[15, 16]. Furthermore, Azuma et al. (2001) considered that AR is neither limited to a particular type of display technologies, such as projective head-mounted displays (HMD) nor to the sense of vision [13]. On the contrary, it can potentially be applied with various projection technologies and to all human senses [17, 18] while it can also be used to augment or substitute users’ missing senses by sensory substitution [16]. They also pointed out that besides the addition of virtual objects, AR applications should also have the ability to remove real objects from the perceived environments. The process of object removal is considered to be a subset of AR and is called mediated or diminished reality [13]. As AR technology has become more widespread and its applications are being utilized by more and more users, several software development kits and platforms, such as Vuforia [19], ARCore [20], ARkit [21], WikiTude [22], ARToolKit [23] etc., have been created in order to facilitate and reinforce the development of AR applications. In recent studies, Amin and Govilkar (2015) [24] compared various AR software development kits (SDKs) such as Vuforia, ARToolKit, WikiTude, ARmedia [25], Kim et al. (2017) also analyzed and went over the differences between Vuforia, ARToolkit and Wikitude SDKs [26] and Nowacki and Woda (2019) analyzed and compared the capabilities of ARCore and ARkit platforms [27]. Moreover, AR headsets are becoming more popular after entering the consumer market. These devices enable users to view and interact with virtual objects or holograms projected onto the real world. Additionally, they capitalize on the ability of AR technology to provide hands-free interfaces by providing a ubiquitous and immersive view of virtual content, eliminating, thus, users’ need to shift context in order to interact with a device while carrying out tasks in the real world [28]. HoloLens [29], Magic Leap [30], Meta 2 [31] and Vuzix Blade [32] are some examples of AR headsets.
Jo
2.1. Mixed Reality Milgram and Kishino (1994) presented the concept of “reality – virtuality continuum”, where the real environment lies at the one end and a completely virtual environment at the other. According to Milgram, mixed reality is between these two ends which consist of AR and augmented virtuality [33]. AR technology is closer to the end of the real environment, as the predominant perception conveyed to the users is the real world augmented with virtual objects (e.g. sounds, images, computer graphics etc.). Augmented virtuality technology is closer to the end of the virtual environment and refers to the augmentation of the virtual world with real objects for greater exactness, authenticity and 4
Journal Pre-proof
pro of
realism. Therefore, the mixed reality environment is a space where objects of the physical and virtual world are presented together in a unified depiction, anywhere between the two ends of the “reality – virtuality continuum” and are not treated as distinct points. Consequently, the limits for what exactly we perceive as real and as virtual in an mixed reality environment are not entirely clear and distinct [34, 35]. 3. Deep Learning
Jo
urn a
lP
re-
Owing to the increase in data volume and computational power, not only new requirements and challenges have been created but also new opportunities and potentials have arisen. Deep learning is an innovative scientific field which attempts to exploit these new situations by offering novel solutions and applications. The term “deep” refers to the multitude of layers through which the data is transformed. Additionally, deep learning allows “computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction” [36]. More specifically, deep learning is a specialized form of machine learning which learns to represent the real world as nested hierarchies of concepts with each individual concept being defined as a result of other simpler and more abstract concepts and representations. Virtually, deep learning models simulate the human way of learning by learning through examples. Deep learning techniques are defined by the use of multi-layer neural networks, as well as advanced supervised and unsupervised learning methods [37]. Deep learning models are characterized by high levels of flexibility. Additionally, they are able to enhance the efficiency and effectiveness of the overall system as they utilize the backpropagation algorithm to indicate how their internal parameters should be changed in order to discover intricate structure in large data sets [36] and they are capable of automatically and autonomously identifying and using the optimal combinations of complex input data [38]. This specific ability allows the development of autonomous human-like decision-making systems. Another deep learning key feature is its dependence on large data volumes to expand its potential. Since models are trained through examples, their efficiency improves as the amount of processed data and the number of input examples increase. Even though datasets and training data are publicly available on the Internet and many more can be manually or artificially created [39], the acquisition of appropriate data for neural network training still remains one of the most common challenges [40]. As deep learning methods are evolving, they are able to discover intricate structures in high-dimensional data and to take advantage of the increased volume of available computation and data more easily and efficiently[36]. Hence, it can be mentioned that the Big Data Era will offer enormous potentials for innovation in deep learning. Deep learning is applicable to various domains as it offers numerous useful and innovative applications, solutions as well as services and it will be even more successfully used in the near future as more and more advanced algorithms
5
Journal Pre-proof
and architectures are being developed [36]. Some of the key advantages of deep learning technology and techniques that could be mentioned are:
pro of
• End-to-end problem solving; • Short testing time;
• Automatic endoscopy and extraction of the main features;
• Effective solving of complex problems (e.g. object classification, natural language processing etc.); • Higher clarity of the system overall structure; • Easy model reusability; • Enhanced system scalability.
re-
Subsequently, the significant increase in computing capabilities and in the available data volume, the recent developments in signal and information processing, as well as the above mentioned facts have led to the development of innovative deep learning applications in various domains. Moreover, some of the most widespread application domains of deep learning technology are:
• Computer vision
lP
• Object detection and recognition • Speech recognition
• Natural language processing (NLP) • Social computing processing
urn a
• Sentiment analysis
In particular, the deep learning architectures and models that are most commonly used in the domain of object detection and recognition are: • Lenet-5 [41]
• Convolutional Network (CNN or ConvNet) [42] • Region-based Convolutional Network (R-CNN) [43] • Faster Region-based Convolutional Network (Faster R-CNN) [44, 45]
Jo
• You Only Look Once (YOLO) [46] • ResNet [47] • Region-based Fully Convolutional Network (R-FCN) [48] • Neural Architecture Search Net (NASNet) [49] 6
Journal Pre-proof
• Single Shot Multibox Detector (SSD) [50] • Mask Region-based Convolutional Network (Mask R-CNN) [51]
pro of
• DenseNet [52] • RetinaNet [53] • EfficientNet [54]
4. Knowledge Graphs and Semantic Web Technologies
Jo
urn a
lP
re-
4.1. Knowledge Graphs The needs and requirements of an ever growing information industry, the advent of linked open data sources and the introduction of knowledge graphs have rekindled the interest in the research of graph-based knowledge representation which plays a major role in modern knowledge management. Due to the simple and highly normalized data models, a multitude of and a wide range of information sources can be accommodated resulting in the development of large-scale knowledge graphs in industry on the web and in research [55]. The term knowledge graph was coined by Google in 2012 [56] and referred to the use of semantic knowledge in order to further enhance the quality of their search results. In a graph-based knowledge representation, data is enriched with contextual information, the nodes of the graph named entities are connected through relations which are regarded as the edges of the graph and whose connectivity is used for knowledge representation [57, 58]. Based on Yan et al. (2018), a knowledge graph focuses more on graphical structures and can be considered as a semantic graph consisting of vertices which represent concepts, entities and edges. More specifically, concepts refer to the general categories of objects, entities are real world physical objects and edges represent the semantic relationships between concepts and/or entities [59]. Through the connection of concepts and/or entities, complete and structured knowledge repositories are formed, facilitating thus the management, retrieval, usage and understanding of information [59]. Hence, a knowledge graph can be considered as “a graph of data with the intent to compose knowledge” [60]. Furthermore, it combines data and semantics in a graph structure, aggregates information from various sources and generates new knowledge by analysis and reasoning. A knowledge graph is a graph consisting of a set of labeled links between entities (vertices) using unambiguous identifiers, denotations and statements, in which meaning is encoded in its structure, a limited set of relation types is used and explicit assertion provenance is included [60, 61, 62]. Moreover, based on Paulheim (2017), knowledge graphs can: i) describe real-world entities and their interrelations, organized in a graph, ii) define entity classes and relations in a knowledge schema, iii) render interrelating arbitrary entities without any restrictions in their domains and/or range possible and iv) cover a multitude of domains. Additionally, he quoted that the above mentioned characteristics can be used as a minimum set of criteria 7
Journal Pre-proof
pro of
to separate knowledge graphs from other collections of knowledge [58]. Ehrlinger and W¨oß (2016) identified collection, extraction and integration of information from external sources as a further essential characteristic which extends a pure knowledge-based system [63]. In addition, a knowledge graph “acquires and integrates information into an ontology and applies a reasoner to derive new knowledge” [63]. Based on this definition, knowledge graph schemas can be regarded to be ontologies and are connected to semantic web technologies[64]. To sum up, a knowledge graph that crawls the entire web could be interpreted as a self-contained semantic web, hence semantic web can be considered as the most comprehensive knowledge graph [63]. 4.2. Semantic Web
Jo
urn a
lP
re-
Knowledge management involves knowledge acquisition, access and maintenance. From this specific perspective, current technology presents restrictions to searching, exporting, maintaining and displaying information [65] as computers have no reliable way to process the semantics since most of the today’s web content is designed to be understood by humans and not to be meaningfully manipulated by computer programs [66]. Nonetheless, the use of this content with its current representation, in combination with the development of increasingly complex methods based on artificial intelligence and computational linguistics, can bring about solutions to these limitations. Intelligent techniques should be utilized as an alternative approach so as to exploit the representation of web content in a machine-processable format. This specific approach is named semantic web and constitutes an extension of the already existing web [65, 66]. Semantic web is a research and development field of computer science which aims at facilitating computers to process a huge data volume and information from the World Wide Web (WWW) and at using semantic information on data and services to enhance the utility and usability of the Web [67]. It has powerful core pillars such as linked data and it provides a common framework which allows data sharing and reusability, permitting, thus, computers not only to read data and information but also to understand its content. More specifically, semantic web is a web of actionable information that is “information derived from data through a semantic theory in order to interpret the symbols” [68]. Moreover, interoperability among systems can be established by using the semantic theory which utilizes logical connection of terms with a view to providing a description of “meaning” [68]. In computer science, the term “semantics” refers to the meaning of languages and not to their syntax. Semantics provides the rules for the syntax interpretation which, however, do not convey the meaning directly, but restrict the potential interpretations for what it has been stated. More specifically, semantics is the interpretation of a phrase or otherwise the interpretation of symbols to meanings. According to Berners-Lee et al. (2001), some of the main aims of semantic web technology are [66]: • to create a means that expresses both data and rules;
8
Journal Pre-proof
• to allow rules exportation from any existing knowledge-representation system onto the web;
pro of
• to structure the webpages meaningful content; • to give well-defined meaning to information;
• to provide data and information that is automatically processed.
Based on Antoniou and van Harmelen (2004), another basic aim of semantic web technology is the development of enhanced knowledge management systems, in which [65]: • Knowledge will be organized in conceptual areas according to their meaning; • Tools will support automated knowledge maintenance;
• Human-friendly queries will be utilized for the searching process;
re-
• Collective information will be retrieved;
• Access to specific information segments will be feasible.
urn a
lP
Computers must be able to access structured collection of information and sets of inference rules in order to be eligible for utilizing semantic web functions so as to conduct automated reasoning [66]. Moreover, the use of open data is a basic prerequisite for the implementation of semantic web technology and the achievement of its objectives. Open data or else data on the web is information that is accessible and usable by all users. Collections of information called ontologies are considered to be a basic component of semantic web. An ontology is “an explicit specification of a conceptualization” [69] and more specifically it is “a formal, explicit specification of a shared conceptualization that is characterized by high semantic expressiveness” [70]. Additionally, ontologies are commonly used as knowledge bases since they allow semantic modeling of knowledge [63]. They identify the relations among terms and have a set of inference rules and taxonomy and can improve the accuracy of web searches, associate knowledge structures and inference rules with information on webpages and handle complicated questions, thus, enhancing web functionality [66]. 5. Related Work
Jo
After briefly going through AR, deep learning, semantic web and knowledge graphs, we present and analyze related applications and systems which combine AR with these technologies. In their study, Van Aart et al. (2010) utilized location aware mobile devices and semantic web techniques in order to search large collections of general and cultural heritage information repositories and acquire knowledge, using minimal user interaction [71]. In their approach, they combined specialized knowledge 9
Journal Pre-proof
Jo
urn a
lP
re-
pro of
about specific domains (e.g. cultural heritage etc.) and general knowledge about points of interest (POI) and geo-locations. Moreover, they showcased three scenarios in which, given the specific geo-location of end users, they provided useful and dynamic information and resources (e.g. current environment, events etc.) which derived from and was enriched by various linked open data sources [72]. Furthermore, with the aim of presenting the annotated data and information in an interactive way, they combined AR with facet selection. Finally, they showed that an informative location-based service can be developed by combining the resources of various linked open data sources. Nixon et al. (2012), in the context of their SmartReality project, described how semantic web and linked open data can be incorporated into AR platforms and enhance users’ experience by providing dynamic and useful information and content of their surroundings and presenting them in a more meaningful and practical manner [73]. Their approach focused on a music scenario aiming at enriching users’ experience of music in their vicinity by using linked open data and semantics in order to dynamically link music references to virtual information, content and resources and present them in an interactive way through the use of AR enhanced by the web of data and the web of services. Finally, in order to evaluate their prototype, they conducted an experiment involving ten (10) participants, in which they augmented a physical street poster at a public transportation stop in Vienna with virtual content and services through the use of their SmartReality platform. Herv´as et al. (2013) proposed an architecture named intelligent augmented reality architecture (i-ARA) which exploited the semantic axioms of web ontology language (OWL) [74] and the expressiveness of semantic web rule language (SWRL) [75] and utilized semantic web principles (e.g. ontological context models) [76]. In addition, they emphasized on users’ personalization and contextawareness with a view to supporting daily users’ needs by simply interacting with the environment through an augmented-reality perspective. In their approach, they focused on defining mechanisms to deploy AR services that support intelligent decision-making functions so as to determine which of the available information should be retrieved and presented. They tested and evaluated their prototype system through interviews and user studies. More specifically, their experiment involved twenty (20) users over a period of four (4) days and their evaluation focused on users’ experiences. Furthermore, in order to assess the use of augmented objects to obtain AR-based information adapted to users’ needs, they utilized a subset of the MoBiS-Q questionnaire [77], in which they applied a Likert-type scale. Matuszka and Kiss (2014) presented a system which combined AR and semantic web technologies [78]. AR was utilized in order to display the contextaware information and semantic web was used so as to store their knowledge base in a uniform format and link it with public datasets. More specifically, their system allows navigation through a specific location and provides information and details about the sight in an interactive manner, helping, thus, users to become familiar with the location characteristics. As a proof of concept, they described the details of their system through a case study, selecting the Hungarian 10
Journal Pre-proof
Jo
urn a
lP
re-
pro of
Kerepesi Cemetery as venue of their application. Akgul et al. (2016) carried out a study in order to tackle some AR detection problems by modifying current deep learning architectures [79]. They reviewed similar state-of-the-art detection methods for AR tracking. Furthermore, they introduced and presented their deep CNN detector called DeepAR which was based on the well-known CNN architecture AlexNet [80] and followed feature-based detection approaches. They also tested and compared their proposed algorithm with another detection method called ORB [81], which is commonly used in OpenCV [82], by using two-dimensional (2D) image targets and implementing the matching algorithm HIPS [83]. They came to the conclusion that DeepAR outperformed ORB as it was specifically designed and optimized for this type of detection tasks. Limmer et al. (2016) proposed and presented a deep multi-scale CNN based approach capable of predicting and localizing the road course at night using camera sensors leveraging deep learning techniques [84]. For their dataset, they used 7095 full-scene-labeled near infrared (NIR) images showing road scenes in various weather conditions, seasons and landscapes during night. They based their proposed framework on [85] but replaced the optical lane detection module used in [86] with a road segmentation module which was based on [87, 88]. Moreover, they assessed the classification performances of diverse network topologies and compared the optical map generated by the best performing classifiers with the standard optical map shown in [86] . Although their approach showed somewhat worse performance in optimal road and weather conditions compared to other lane based algorithms, it ultimately performed consistently well in situations without lane markings and/or in adverse weather conditions increasing, thus, the robustness and availability of shorter distance road course estimations. Kim et al. (2017) carried out a study in which they developed a mobile AR system utilizing semantic web technologies so as to provide contextual information about cultural heritage sites [89]. The aim of their application was to provide users with relative and useful tangible and intangible information (e.g. events, persons etc.) about a target heritage site and broaden their knowledge. Furthermore, their information modeling framework consisted of: i) aggregation of heterogeneous cultural heritage data from five (5) different web databases, ii) semantic linking web resources and iii) modeling a user-centered information ontology utilizing the Korean Cultural Heritage Data Model. To assess their proposed approach, they focused on recognizing three specific POIs in a target heritage site, namely Inheongjeon Hall and its vicinity in Seoul. With a view to evaluating their information model and AR application design and assessing the engagement and learning impact of their proposed system, they conducted user studies both in laboratory and outdoor environments. The results of their experiment evaluation showed that their application provided pleasant and acceptable user experience in terms of its effective, cognitive and operative features. In their study, Schr¨oder and Ritter (2017) proposed and presented a system which generates context-sensitive feedback and uses HMDs with the aim of 11
Journal Pre-proof
Jo
urn a
lP
re-
pro of
observing and recognizing users’ actions [90]. In order to accurately detect and localize hand-object interactions, they adopted the approach of [91], applied deep learning techniques and trained a fully convolutional network (FCN) which uses the VGG-16 architecture [88]. Furthermore, they stated that on condition that suitable training data and state-based descriptions of the target activities are given, their action recognition approach can be generally applied to goal-oriented tasks. Abdi and Meddeb (2017) presented a new approach for real-time traffic sign recognition (TSR) based on cascade deep learning and AR [92]. In order to reduce the computational region generation, they proposed a TSR method based on the Haar cascade. Their new algorithm starts with the simpler Haar Cascade detection algorithms to filter out obvious non-traffic signs. Then it brings in the more sophisticated deep learning neural networks only in the final stages with the aim of achieving not only robust and fast detection but also a good balance between accuracy and complexity. More specifically, their method employs the following three (3) steps: i) hypothesis generation, ii) verification and iii) augmentation. Moreover, they compared their method with other stateof-the-art algorithms such as the committee of CNNs [93], human performance [94], multi-scale CNNs [95] and random forests [96] on the German traffic sign recognition benchmark (GTSRB) dataset [97]. They stated that according to the results of their comparison, their approach reached comparable to other state-of-the-art approaches performance while attaining better results in certain categories such as “OtherProhibitions” and “Mandatory” and achieving less computational complexity and shorter training time. Rao et al. (2017) carried out a study in which they proposed a method to achieve fast, robust and markerless mobile augmented reality (MAR) in uncontrolled outdoor environments. They used the device built-in global positioning system (GPS), inertial measurement unit (IMU) and magnetometer as the basis for geovisualization and interaction [98]. In order to ensure system robustness to poor signal conditions, their method is independent of the network. Utilizing MXNet [99] as their visual object detection approach and with the aim of reducing the computing cost, they implemented a modified version of single shot detector (SSD) [50] which used a truncated SqueezeNet [100] architecture. With a view to evaluating their method and validating its results, they tested their prototype system on the Wuhan University Campus on ten (10) selected geographic objects. Finally, they quoted that their method achieved high detection accuracy and stable geovisualization results. Contreras et al. (2017) combined semantic web and AR in order to provide a mobile application that searches places, people and events within a university campus with high degree of query expressiveness and enhanced user experience [101]. Their system followed a client-server architecture in which the client side utilizes AR to enhance users’ interaction and the server side uses a semantic model (ontology), an “Indexing” component, a “Query Creation and Execution” component and a “Results Classification and Ranking” component for the searching process. Their semantic model was named “University of Cuenca Ontology” and was created following and applying the NeOn methodology 12
Journal Pre-proof
Jo
urn a
lP
re-
pro of
proposed by Suarez-Figueroa [102]. Moreover, they presented the challenges they encountered during the three main phases of their searching process and validated their approach with a use case example. Polap et al. (2017) focused on interpreting a reality-based real-time environment evaluation in order to help users be alert of and detect impending obstacles[103]. Their approach put emphasis on analyzing the surrounding environment through diverse sensors to prevent collisions with oncoming objects. Their hybrid architecture uses AR techniques, deep learning algorithms and a dedicated method of data extraction from the samples obtained in real time to detect and estimate as much of the incoming information as possible. In their study, Katsaros and Keramopoulos (2017) presented a prototype knowledge based application, named FarmAR which exploits AR to identify specific plants and provides useful information through the use of semantic web [104]. Furthermore, based on the information obtained from the PlantVillage online community, they developed an ontology which described two (2) types of plants and five (5) types of diseases and contained five (5) instances of plants. They used predefined SPARQL queries with a view to retrieving information from their ontology. They stated that for the recognition of plant images, they used Vuforia’s Target Manager and as a result, their prototype only recognizes images that are included in their Vuforia’s Target Manager database. In a more recent study, Katsaros et al. (2017) improved their above mentioned prototype by utilizing sensors in order to retrieve useful information about a specific area such as temperature, humidity, soil pH etc. [105]. They quoted that their improved prototype helps farmers improve the conditions prevailing in the cultivation area while simultaneously contributes to the calculation of the crop index metric from sensor data. Lin et al. (2018) conducted a study in which they proposed and showcased a campus tour application utilizing advanced AR technologies, such as computer vision and object recognition [106]. Furthermore, they used deep learning to improve the object recognition ability and positioning techniques so as to reduce search delays. In their approach, they used scale-invariant feature transform (SIFT) [107] in order to extract the features of the campus attractions before storing them in a database. The architecture of their application consists of three major components, that is search engine, database and output engine. They quoted that their application achieved attraction recognition accuracy rate of 90% and as such, the attractions can be easily identified by their appearance and characteristics. However, they also stated that the use of three-dimensional (3D) models with CNN was significantly delayed when running on mobile devices. In their study, Wang et al. (2018) [108] described and presented an AR system which was based on ORB-SLAM2 [109] and used customly designed rescue robots in urban search and rescue (USAR) tasks whose software was built using robot operation system (ROS) [110]. In order to get the position and posture of the robot, they used simultaneous localization and mapping (SLAM) through the sensor of the Kinect RGB-D camera which was mounted on it. Furthermore, they demonstrated deep learning methods for object (“victim”) detection and localization through the use of the semi-supervised algorithm 13
Journal Pre-proof
Jo
urn a
lP
re-
pro of
one-shot video object segmentation (OSVOS) [111]. As they stated, some of their system limitations are: i) the use of a single AR marker, ii) the sparse map points that ORB-SLAM2 can sometimes generate and iii) the incorrect segmentation results that OSVOS produces when there is no target in the image. They quoted that their experimental results show that their proposed system can be applied to enhance human robot interaction (HRI) and can contribute to the facilitation of the operators’ task completion by specifying the location of the object (“victim”) at any given time. Subakti and Jiang (2018), in the context of Industry 4.0 and with a view to interacting with machines in indoor smart factories, designed, developed and implemented a fast and markerless indoor AR system using MobileNets [112]. Their method was implemented using a supervisory control and data acquisition (SCADA) system [113] through IoT technology so as to report machine status in real time to a cloud-side SCADA server. They also utilized a smartphone equipped with depth cameras which supports TensorFlow deep learning engine [114] and Google Tango software. Furthermore, they used the distance between the user and the machine in order to determine the level of detail of the augmented information. Their prototype system was able to recognize three different kinds of industrial machines and five kinds of portions in a certain industrial machine. Finally, they stated that their system achieved unique interaction modes, high accuracy and intuitive visualization. In his study, Lalonde (2018) stated that in order to accomplish realism when combining virtual with real objects, the same characteristics must be shared between them [115]. He addressed computer vision tasks such as camera localization, object tracking, and illumination estimation as the key challenges to be solved in order to attain realistic results. Additionally, he analyzed how these challenges could be solved robustly and accurately by utilizing deep learning techniques and he presented a real-time temporal six degrees of freedom (DOF) object tracking and illumination estimation method. Aliprantis et al. (2018) proposed a concept of integrating linked open data sets into AR applications for cultural heritage institutions [116]. Their concept implements an indoor tracking method which is based on image identification and matching between linked open data cloud images relative to cultural artifacts and frames from user’s mobile device camera and aims at eliminating the need to accurately pinpoint user’s exact location. They argued that the images stored in the linked open data cloud could be used as fiducial markers for AR tracking techniques and that this feature could be exploited to track users POI in indoor environments. More specifically, they aimed at determining which artifact users are currently examining through their mobile device camera without users’ accurate location being needed. Finally, they stated that although many issues are currently presented and a great amount of computing power is required to perform the matching function, in the future it may become an AR tool that will incorporate linked open data functionalities. With the aim of helping users explore large-scale neuron morphological databases, Li et al. (2018) developed a deep learning based feature representation method [117]. Specifically, their approach employed binary coding to compress 14
Journal Pre-proof
urn a
lP
re-
pro of
feature vectors into short binary codes and used an unsupervised deep neural network for training. In order to further enhance the accurate representation, they fused deep features with hand-crafted features. Additionally, they validated the efficacy of their framework on public data set including 58,000 neurons which showed promising retrieval precision and efficiency in comparison with other state-of-the-art methods. Finally, by using AR techniques, they developed a neuron visualization application which helps users explore and analyze neuron morphologies in an interactive manner. Englert et al. (2019) presented and evaluated a web service which offers cloudbased machine learning services to improve Augmented Reality applications [118]. More specifically, they emphasized on mobile and web clients with special demands regarding tracking quality and registration of complex scenes that require an application-specific coordinate frame. Their service aimed at allowing an easy integration of state-of-the-art machine learning techniques into new and existing augmented reality applications and at reducing camera drift that still occurs in modern AR frameworks. Additionally, they showcased realworld applications that utilize their web service and evaluated its performance and accuracy. Finally, they highlighted how their cloud-based segmentation approach using an appropriately trained CNN can improve the accuracy of MAR applications and allow developers to use a context-based coordinate frame to describe a content scene. With a view to enhancing users’ understanding, interaction and exploitation of medicine information, Flores-Flores et al. (2019) developed an application named ARLOD [119]. This specific application integrates the collection of information from resource description framework (RDF) linked open datasets belonging to the field of medicine through SPARQL queries into a MAR application. For the development of this AR application, the Vuforia SDK was used and specific medicine packaging images were utilized as fiducial markers. They stated that in order to attain simpler scalability and maintenance, their application architecture has a layered design, leading to tasks being distributed throughout all the layers. Specifically, the presentation, integration, AR and semantic layers are the main layers of this application. Finally, the application utilized two external modules, one for extracting information and one as a repository for storing 3D models. 6. Discussion
Jo
The integration of deep learning, semantic web and knowledge graphs into augmented reality can indeed provide a lot of real-time, interactive, user-centered and user-friendly intelligent applications which focus on enhancing the quality of experience and quality of services for end-users (Figure 1). It can be inferred that more and more applications and systems that utilize these technologies are being invented and developed. Moreover, AR applications are increasingly used in everyday life and in various application domains. Its novel applications can provide users with new ways to interact with both the physical and digital environment, as well as with smart solutions to facilitate everyday life and daily tasks. They offer users new 15
lP
re-
pro of
Journal Pre-proof
Figure 1: The relationship among augmented reality, deep learning, semantic web and knowledge graphs.
Jo
urn a
ways to communicate and get informed while contributing to creating interactive experiences simultaneously. Deep learning technology aims at instilling intelligence into systems with a view to enhancing and improving their efficiency and effectiveness. Object detection, image processing and computer vision are three of the main fields in which it is used. Due to rapid technological developments, the creation of new algorithms and architectures, as well as the exponential increase of data volume, the effectiveness of deep learning in these fields has been improved. These fields comprise key factors in many AR applications. Hence, the use of deep learning technology and techniques in combination with AR can lead to the development of more efficient, interactive and intelligent applications. Semantic web is another technology which is increasingly applied in various cases. This technology allows the use of semantically interconnected information whose content is more easily processed and understood by machines. It permits and facilitates the use of linked open data in order to provide dynamically changing and conceptual information. Moreover, it supports the use of more complex unstructured queries that are closer to the users and promotes the concept of open data as well as data sharing and reusability. Moreover, knowledge graphs are directly connected with the semantic web technologies which can be regarded 16
Journal Pre-proof
pro of
as the most comprehensive knowledge graph and help cope with and fulfill the needs and requirements of an ever growing information industry. Knowledge graphs combine data and semantics in a graph structure, aggregate information from various sources and generate new knowledge by analysis and reasoning. In this context, information is interconnected and interrelated and data is enriched with contextual information. As a result, complete and structured knowledge repositories are formed and information generation, retrieval, management, usage and understanding are facilitated. As such and due to the quantity of interlinked information as well as the dynamic selection and integration of data, the integration of semantic web and knowledge graphs into AR can facilitate and enhance users’ being informed by providing conceptual and personalized data in real time.
Jo
urn a
lP
re-
6.1. Integrating Semantic Web and Knowledge Graphs into Augmented Reality The integration of semantic web and knowledge graphs into AR helps overcome specific limitations. Semantic web renders handling data from heterogeneous sources feasible and its principles are well-suited for organizing content for AR applications. Its features enable AR applications to utilize dynamically changing information and not information that is retrieved from static databases. Additionally, it supports ontologies and knowledge graphs as well as the SPARQL query language which enables users to conduct queries in a more user-friendly manner. Furthermore, semantic web makes use of linked open data and as a result, it can enhance AR by enabling the use of a wide variety of contextual data, through dynamically selecting and integrating data sources and by allowing users to experience Web-like browsing in AR applications [120, 121]. In addition, owing to this fact, it can exploit sensor data obtained from users’ devices more effectively by defining the query for the desired data more precisely [116]. The vast volume of semantically interrelated and interconnected information offers a lot of merits for AR applications but it also requires effective filtering of the displayed data which should be based on the overall context and users’ needs and requirements in real time. It should also be noted that in MAR application cases, users are looking at small screens so appropriate information filtering is essential so as not to overwhelm users with unnecessary information. On the other hand, due to the nature of linked data, AR applications are able to display only the more relevant information so as not to confuse users while also allowing them to find additional information by following the semantically interconnected links of information without having to switch over to another application [121]. Due to the open data nature of semantic web and linked open data, information can be both accessed and edited by all users which raises quality, trust and privacy issues [122]. Additionally, based on this fact and the heterogeneous datasets, data ambiguity, duplication, misclassification, mismatching, overlapping and enrichment issues along with data differences regarding vocabulary and ontology may occur [116, 123]. The latter may be resolved by using ontology matching techniques [124]. It is worth noting that semantic web and knowledge graphs are also benefited when being integrated into AR as the raising amount of content generated by AR applications increases both the volume and diversity of available 17
Journal Pre-proof
pro of
information which can also be linked to the linked open data cloud [121]. In addition, AR provides interactive and immersive interfaces which can display semantic web information and linked data in a more user-friendly manner and as a result, information becomes more meaningful to users.
Jo
urn a
lP
re-
6.2. Integrating Deep Learning into Augmented Reality Based on the articles published in the first decade of the International Symposium on Mixed and Augmented Reality (ISMAR), Zhou et al. (2008) went over some of the AR limitations that need to be addressed, namely tracking techniques (e.g. sensor-based, vision-based and hybrid), AR Displays (e.g. seethrough HMDs, projection-based and handheld displays) as well as interaction techniques and user interfaces (e.g. hybrid AR Interfaces, tangible and collaborative AR) [11]. Moreover, Kim et al. (2018) summarized the advances made towards the elimination of these limitations and in the AR field in general during the second decade of ISMAR [125]. Moreover, AR applications focus more on mobile devices and aim at achieving realism when combining the virtual world with the real world in real time. Hence, several computer vision challenges such as camera localization, object and user input tracking and illumination estimation in rapidly changing scenes still need to be addressed more efficiently [115]. Several SLAM-based techniques, which compare visual features between frames, are used in various AR applications for camera localization, environment mapping and object tracking in order to solve some of these issues [79]. By combining additional sensor data (e.g. gyroscope, GPS and accelerometer data), the overall tracking effectiveness is further increased. In addition, deep learning methods can also be used to help address several of the computer vision related limitations currently existing in AR more effectively and robustly [126]. Furthermore, deep learning can further enhance AR applications regarding tracking quality, registration of complex scenes as well as their overall accuracy [118]. Specifically, complex recognition tasks have been successfully carried out and addressed using various deep learning methods and architectures. With a view to showcasing the potentials that can be yielded when integrating machine learning and deep learning into AR, Englert et al. (2019) quoted several studies in which CNNs were utilized in order to perform computer vision tasks which can also be applied in AR scenarios [118]. These studies involved the use of CNNs for high accuracy image classification tasks [80, 88, 127], context-based image segmentation [91, 128] as well as camera relocalization [129, 130, 131]. Moreover, deep learning facilitates the development and adoption of AR as deep learning architectures i) are trained on very large data volumes, they can be robust to a wide variety of capture conditions (e.g. motion blur, color shifts etc.), ii) possess very efficient graphics processing unit (GPU) implementations that can be processed on mobile GPUs in real time given a small enough network, ensuring low latency of the overall system which is a prerequisite for AR applications and iii) are able to automatically learn object-specific features from data [115]. Additionally, since the camera is always active collecting image data and useful information from the environment in AR applications, computer vision tasks, such as object detection, text processing etc., can always be applied. Hence, 18
Journal Pre-proof
re-
pro of
computer vision constitutes a basic component of AR and many of the features already provided by AR SDKs and frameworks are powered by machine and deep learning methods. Thus, by integrating deep learning into AR, AR can be enriched and its range of possible use cases can be greatly expanded. Furthermore, some degree of artificial intelligence and systems that are able to detect and interpret human inputs such as hand movements and gestures, speech and eye movement are required so as to create diverse forms of interactive AR experience or application. Through deep learning, sophisticated AR systems that learn and understand users’ various poses, gestures and motions both on the device as well as in the virtual world can be developed. Eye tracking is another fast adopted user input method which deep learning can help to be addressed. There are a lot of benefits when it comes to eye tracking interactions in AR applications such as hands free experience and foveated rendering [132]. More specifically, when using foveated rendering, the number of pixels shaded and the overall graphics computation are significantly reduced and the drain on bandwidth and processing is greatly lessened as only the user’s direct field of gaze is rendered in full resolution [133]. Thus, more realistic and intuitive experiences with high quality visuals and graphics can be created without jeopardizing slowing down the overall experience. 7. Conclusions
Jo
urn a
lP
AR is a contemporary technology that aims at creating new user experiences and new ways for users to interact with both the real and digital world. Moreover, it aims at satisfying users’ needs for real-time digital representation of information by augmenting and depicting this information in users’ physical environment. More specifically, it provides real-time access to the rapidly flowing data mainly at the right time and in the corresponding space. The functions, applications and services of this technology can be further enhanced and reinforced by combining it with other innovative technologies such as deep learning and semantic web, as well as knowledge graphs. The aim of this study was to showcase how AR functions and services can be enhanced when integrating other technologies such as deep learning, semantic web and knowledge graphs and to present the potentials that their combination can provide in developing contemporary, userfriendly and user-centered intelligent applications. Hence, we briefly went over these specific technologies, we presented and analyzed related studies which refer to the development of AR applications and systems. Moreover, we discussed how the integration of deep learning, semantic web and knowledge graphs into AR can enhance the quality of experience and quality of service of AR applications in order to benefit users by improving their everyday experiences and life. All in all, it can be said that the use of AR in combination with other contemporary technologies may indeed be able to enhance and reinforce the effectiveness and the efficiency of its applications, functions and services. Based on the above mentioned facts, it is apparent that deep learning, semantic web and knowledge graphs should be considered as technologies suitable for the development of interactive and intelligent AR applications. This integration is able to meet users’ 19
Journal Pre-proof
pro of
needs and requirements for dynamic, adaptive and personalized technological applications and digital content. In addition, it provides smart solutions to facilitate users’ everyday life and daily tasks while it also helps them explore and interact with virtual content as well as the real and virtual world in real time and in a user-friendly manner. Future work will concentrate on developing intelligent applications which will combine the novel technologies of AR, deep learning and semantic web and will utilize knowledge graphs. More specifically, these applications will focus on recognizing objects under different conditions, retrieving relevant information by exploiting semantically linked open data and augmenting this information in users’ natural environment in real time in an interactive way. References
re-
[1] G. Lampropoulos, K. Siakas, T. Anastasiadis, Internet of things in the context of industry 4.0: An overview, International Journal of Entrepreneurial Knowledge 7 (1) (2019) 4–19. [2] K. Lee, Augmented reality in education and training, TechTrends 56 (2) (2012) 13–21.
lP
[3] H.-K. Wu, S. W.-Y. Lee, H.-Y. Chang, J.-C. Liang, Current status, opportunities and challenges of augmented reality in education, Computers & education 62 (2013) 41–49. [4] M. Dunleavy, Design principles for augmented reality learning, TechTrends 58 (1) (2014) 28–34.
urn a
[5] N. Enyedy, J. A. Danish, D. DeLiema, Constructing liminal blends in a collaborative augmented-reality learning environment, International Journal of Computer-Supported Collaborative Learning 10 (1) (2015) 7–34. [6] P. Chen, X. Liu, W. Cheng, R. Huang, A review of using augmented reality in education from 2011 to 2016, in: Innovations in smart learning, Springer, 2017, pp. 13–18. ´ Di Serio, M. B. Ib´an ˜ ez, C. D. Kloos, Impact of an augmented reality [7] A. system on students’ motivation for a visual art course, Computers & Education 68 (2013) 586–596. [8] C. Wasko, What teachers need to know about augmented reality enhanced learning environments, TechTrends 57 (4) (2013) 17–21.
Jo
[9] T. P. Caudell, D. W. Mizell, Augmented reality: An application of headsup display technology to manual manufacturing processes, in: Proceedings of the twenty-fifth Hawaii international conference on system sciences, Vol. 2, IEEE, 1992, pp. 659–669.
20
Journal Pre-proof
[10] L. Johnson, A. Levine, R. Smith, S. Stone, The 2010 Horizon Report., 2010.
pro of
[11] F. Zhou, H. B.-L. Duh, M. Billinghurst, Trends in augmented reality tracking, interaction and display: A review of ten years of ismar, in: Proceedings of the 7th IEEE/ACM international symposium on mixed and augmented reality, IEEE Computer Society, 2008, pp. 193–202. [12] R. T. Azuma, A survey of augmented reality, Presence: Teleoperators & Virtual Environments 6 (4) (1997) 355–385. [13] R. T. Azuma, Y. Baillot, R. Behringer, S. Feiner, S. Julier, B. MacIntyre, Recent advances in augmented reality, IEEE computer graphics and applications 21 (6) (2001) 34–47.
re-
[14] M. Billinghurst, A. Clark, G. Lee, et al., A survey of augmented reality, R in Human–Computer Interaction 8 (2-3) (2015) Foundations and Trends 73–272. [15] B. Furht, Handbook of augmented reality, Springer Science & Business Media, 2011.
lP
[16] J. Carmigniani, B. Furht, M. Anisetti, P. Ceravolo, E. Damiani, M. Ivkovic, Augmented reality technologies, systems and applications, Multimedia tools and applications 51 (1) (2011) 341–377. [17] M. K. McGee, Integral perception in augmented reality, Ph.D. thesis, Virginia Tech (1999). [18] D. Van Krevelen, R. Poelman, A survey of augmented reality technologies, applications and limitations, International journal of virtual reality 9 (2) (2010) 1–20.
urn a
[19] Vuforia, Inc., Vuforia, https://developer.vuforia.com/, 2019, (accessed 22 December 2019). [20] Google, Inc., ARCore, https://developers.google.com/ar, 2019, (accessed 22 December 2019). [21] Apple, Inc., ARKit, https://developer.apple.com/ augmented-reality/, 2019, (accessed 22 December 2019). [22] Wikitude, Inc., Wikitude, https://www.wikitude.com/, 2019, (accessed 22 December 2019).
Jo
[23] ARToolworks, Inc., ARToolKit, http://www.hitl.washington.edu/ artoolkit/, 2019, (accessed 22 December 2019). [24] D. Amin, S. Govilkar, Comparative study of augmented reality sdks, International Journal on Computational Science & Applications 5 (1) (2015) 11–26. 21
Journal Pre-proof
[25] ARmedia, Inc., ARmedia, http://www.armedia.it/, 2019, (accessed 22 December 2019).
pro of
[26] S. K. Kim, S.-J. Kang, Y.-J. Choi, M.-H. Choi, M. Hong, Augmentedreality survey: from concept to application., KSII Transactions on Internet & Information Systems 11 (2) (2017). [27] P. Nowacki, M. Woda, Capabilities of arcore and arkit platforms for ar/vr applications, in: International Conference on Dependability and Complex Systems, Springer, 2019, pp. 358–370. [28] I. Wang, J. Smith, J. Ruiz, Exploring virtual agents for augmented reality, in: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, 2019, p. 281. [29] Microsoft, Inc., HoloLens, https://www.microsoft.com/en-us/ hololens, 2019, (accessed 22 December 2019).
re-
[30] Magic Leap, Inc., Magic Leap, https://www.magicleap.com/, 2019, (accessed 22 December 2019). [31] Metavision, Inc., Meta 2, https://www.metavision.com/, 2019, (accessed 22 December 2019).
lP
[32] Vuzix, Inc., Vuzix Blade, https://www.vuzix.com/, 2019, (accessed 22 December 2019). [33] P. Milgram, F. Kishino, A taxonomy of mixed reality visual displays, IEICE TRANSACTIONS on Information and Systems 77 (12) (1994) 1321–1329. [34] V. Lepetit, On computer vision for augmented reality, in: 2008 international symposium on ubiquitous virtual reality, IEEE, 2008, pp. 13–16.
urn a
[35] D. Schmalstieg, A. Fuhrmann, G. Hesina, Z. Szalav´ari, L. M. Encarna¸cao, M. Gervautz, W. Purgathofer, The studierstube augmented reality project, Presence: Teleoperators & Virtual Environments 11 (1) (2002) 33–54. [36] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, nature 521 (7553) (2015) 436–444. [37] L. Deng, D. Yu, et al., Deep learning: methods and applications, FoundaR in Signal Processing 7 (3–4) (2014) 197–387. tions and Trends
Jo
[38] S. Abrahams, D. Hafner, E. Erwitt, A. Scarpinelli, TensorFlow for Machine Intelligence: A Hands-on Introduction to Learning Algorithms, Bleeding Edge Press, 2016. [39] I. B. Barbosa, M. Cristani, B. Caputo, A. Rognhaugen, T. Theoharis, Looking beyond appearances: Synthetic training data for deep cnns in re-identification, Computer Vision and Image Understanding 167 (2018) 50–62. 22
Journal Pre-proof
pro of
[40] B. Planche, Z. Wu, K. Ma, S. Sun, S. Kluckner, O. Lehmann, T. Chen, A. Hutter, S. Zakharov, H. Kosch, et al., Depthsynth: Real-time realistic synthetic data generation from cad models for 2.5 d recognition, in: 2017 International Conference on 3D Vision (3DV), IEEE, 2017, pp. 1–10. [41] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324. [42] Y. LeCun, P. Haffner, L. Bottou, Y. Bengio, Object recognition with gradient-based learning, in: Shape, contour and grouping in computer vision, Springer, 1999, pp. 319–345. [43] R. Girshick, J. Donahue, T. Darrell, J. Malik, Region-based convolutional networks for accurate object detection and segmentation, IEEE transactions on pattern analysis and machine intelligence 38 (1) (2015) 142–158.
re-
[44] R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448. [45] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, in: Advances in neural information processing systems, 2015, pp. 91–99.
lP
[46] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788. [47] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
urn a
[48] J. Dai, Y. Li, K. He, J. Sun, R-fcn: Object detection via region-based fully convolutional networks, in: Advances in neural information processing systems, 2016, pp. 379–387. [49] B. Zoph, Q. V. Le, Neural architecture search with reinforcement learning, arXiv:1611.01578 (2016). [50] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, in: European conference on computer vision, Springer, 2016, pp. 21–37.
Jo
[51] K. He, G. Gkioxari, P. Doll´ar, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969. [52] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.
23
Journal Pre-proof
[53] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Doll´ar, Focal loss for dense object detection, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
pro of
[54] M. Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, arXiv preprint arXiv:1905.11946 (2019). otzsch, V. Thost, Ontologies for knowledge graphs: Breaking the [55] M. Kr¨ rules, in: International Semantic Web Conference, Springer, 2016, pp. 376–392. [56] A. Singhal, Introducing the knowledge graph: things, not strings, Official google blog (2012). [57] M. Kr¨otzsch, Ontologies for knowledge graphs?, in: Description Logics, 2017.
re-
[58] H. Paulheim, Knowledge graph refinement: A survey of approaches and evaluation methods, Semantic web 8 (3) (2017) 489–508. [59] J. Yan, C. Wang, W. Cheng, M. Gao, A. Zhou, A retrospective of knowledge graphs, Frontiers of Computer Science 12 (1) (2018) 55–74.
lP
[60] A. Hogan, D. Brickley, C. Gutierrez, A. Polleres, A. Zimmermann, (re)defining knowledge graphs, in: In Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371), Vol. 8, Dagstuhl Reports, 2019, pp. 74–79. [61] A. New, S. M. Rashid, J. S. Erickson, D. L. McGuinness, K. P. Bennett, Semantically-aware population health risk analyses, arXiv:1811.11190 (2018).
urn a
[62] H. Zhao, Y. Wang, A. Lin, B. Hu, R. Yan, J. McCusker, W. Chen, D. L. McGuinness, L. Schadler, L. C. Brinson, Nanomine schema: An extensible data representation for polymer nanocomposites, APL Materials 6 (11) (2018). [63] L. Ehrlinger, W. W¨oß, Towards a definition of knowledge graphs., SEMANTiCS (Posters, Demos, SuCCESS) 48 (2016).
Jo
[64] J. Jetschni, V. G. Meister, Schema engineering for enterprise knowledge graphs: A reflecting survey and case study, in: 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), IEEE, 2017, pp. 271–277. [65] G. Antoniou, F. Van Harmelen, A semantic web primer, MIT press, 2004. [66] T. Berners-Lee, J. Hendler, O. Lassila, et al., The semantic web, Scientific american 284 (5) (2001) 28–37.
24
Journal Pre-proof
[67] A. Soylu, F. M¨odritscher, P. De Causmaecker, Ubiquitous web navigation through harvesting embedded semantic data: A mobile scenario, Integrated Computer-Aided Engineering 19 (1) (2012) 93–109.
pro of
[68] N. Shadbolt, T. Berners-Lee, W. Hall, The semantic web revisited, IEEE intelligent systems 21 (3) (2006) 96–101. [69] T. R. Gruber, Toward principles for the design of ontologies used for knowledge sharing?, International journal of human-computer studies 43 (5-6) (1995) 907–928. [70] C. Feilmayr, W. W¨ oß, An analysis of ontologies and their success factors for application to business, Data & Knowledge Engineering 101 (2016) 1–23.
re-
[71] C. Van Aart, B. Wielinga, W. R. Van Hage, Mobile cultural heritage guide: location-aware semantic search, in: International Conference on Knowledge Engineering and Knowledge Management, Springer, 2010, pp. 257–271. [72] C. Bizer, T. Heath, T. Berners-Lee, Linked data: The story so far, in: Semantic services, interoperability and web applications: emerging concepts, IGI Global, 2011, pp. 205–227.
lP
[73] L. Nixon, J. Grubert, G. Reitmayr, J. Scicluna, Semantics enhancing augmented reality and making our reality smarter, in: OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”, Springer, 2012, pp. 863–870. [74] D. L. McGuinness, F. Van Harmelen, et al., Owl web ontology language overview, W3C recommendation 10 (10) (2004) 2004.
urn a
[75] I. Horrocks, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, M. Dean, et al., Swrl: A semantic web rule language combining owl and ruleml, W3C Member submission 21 (79) (2004) 1–31. [76] R. Herv´as, J. Bravo, J. Fontecha, V. Villarreal, Achieving adaptive augmented reality through ontological context-awareness applied to aal scenarios, Journal of Universal Computer Science 19 (9) (2013) 1334–1349. [77] M. Vuolle, M. Tiainen, T. Kallio, T. Vainio, M. Kulju, H. Wigelius, Developing a questionnaire for measuring mobile business service experience, in: Proceedings of the 10th international conference on Human computer interaction with mobile devices and services, ACM, 2008, pp. 53–62.
Jo
[78] T. Matuszka, A. Kiss, Alive cemeteries with augmented reality and semantic web technologies, International Journal of Computer, Information Science and Engineering 8 (2014) 32–36. [79] O. Akgul, H. I. Penekli, Y. Genc, Applying deep learning in augmented reality tracking, in: 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), IEEE, 2016, pp. 47–54. 25
Journal Pre-proof
[80] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105.
pro of
[81] E. Rublee, V. Rabaud, K. Konolige, G. R. Bradski, Orb: An efficient alternative to sift or surf., in: IEEE International Conference onComputer Vision (ICCV), 2011, pp. 22564–2571. [82] G. Bradski, The opencv library, Dr Dobb’s J. Software Tools 25 (2000) 120–125. [83] S. Taylor, T. Drummond, Binary histogrammed intensity patches for efficient and robust matching, International journal of computer vision 94 (2) (2011) 241–265.
re-
[84] M. Limmer, J. Forster, D. Baudach, F. Sch¨ ule, R. Schweiger, H. P. Lensch, Robust deep-learning-based road-prediction for augmented reality navigation systems at night, in: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), IEEE, 2016, pp. 1888–1895. ule, R. Schweiger, K. Dietmayer, Augmenting night vision video [85] F. Sch¨ images with longer distance road course information, in: 2013 IEEE Intelligent Vehicles Symposium, IEEE, 2013, pp. 1233–1238.
lP
[86] R. Risack, P. Klausmann, W. Kr¨ uger, W. Enkelmann, Robust lane recognition embedded in a real-time driver assistance system, in: in Proc. IEEE I, 1998, pp. 35–40. [87] C. Farabet, C. Couprie, L. Najman, Y. LeCun, Learning hierarchical features for scene labeling, IEEE transactions on pattern analysis and machine intelligence 35 (8) (2012) 1915–1929.
urn a
[88] K. Simonyan, A. Zisserman, Very deep convolutional networks for largescale image recognition, arXiv:1409.1556 (2014). [89] H. Kim, T. Matuszka, J.-I. Kim, J. Kim, W. Woo, Ontology-based mobile augmented reality in cultural heritage sites: information modeling and user study, Multimedia Tools and Applications 76 (24) (2017) 26001–26029. [90] M. Schr¨oder, H. Ritter, Deep learning for action recognition in augmented reality assistance systems, in: ACM SIGGRAPH 2017 Posters, ACM, 2017.
Jo
[91] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440. [92] L. Abdi, A. Meddeb, Deep learning traffic sign detection, recognition and augmentation, in: Proceedings of the Symposium on Applied Computing, ACM, 2017, pp. 131–136.
26
Journal Pre-proof
[93] D. Cire¸sAn, U. Meier, J. Masci, J. Schmidhuber, Multi-column deep neural network for traffic sign classification, Neural networks 32 (2012) 333–338.
pro of
[94] J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel, Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition, Neural networks 32 (2012) 323–332. [95] P. Sermanet, Y. LeCun, Traffic sign recognition with multi-scale convolutional networks., in: International Joint Conference on Neural Networks (IJCNN), 2011, pp. 2809–2813. [96] F. Zaklouta, B. Stanciulescu, O. Hamdoun, Traffic sign classification using kd trees and random forests, in: International Joint Conference on Neural Networks (IJCNN), 2011, pp. 2151–2155.
re-
[97] J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel, The german traffic sign recognition benchmark: A multi-class classification competition., in: International Joint Conference on Neural Networks (IJCNN), 2011, pp. 1453–1460. [98] J. Rao, Y. Qiao, F. Ren, J. Wang, Q. Du, A mobile outdoor augmented reality method combining deep learning object detection and spatial relationships for geovisualization, Sensors 17 (9) (2017) 1951.
lP
[99] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, Z. Zhang, Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems, arXiv:1512.01274 (2015). [100] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, K. Keutzer, Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size, arXiv:1602.07360 (2016).
urn a
[101] P. Contreras, D. Chimbo, A. Tello, M. Espinoza, Semantic web and augmented reality for searching people, events and points of interest within of a university campus, in: 2017 XLIII Latin American Computer Conference (CLEI), IEEE, 2017, pp. 1–10. [102] M. C. Su´ arez-Figueroa, Neon methodology for building ontology networks: specification, scheduling and reuse, Ph.D. thesis, Informatica (2010). [103] D. Polap, K. Kesik, K. Ksia˙zek, M. Wo´zniak, Obstacle detection as a safety alert in augmented reality models by the use of deep learning techniques, Sensors 17 (12) (2017) 1–16.
Jo
[104] A. Katsaros, E. Keramopoulos, Farmar, a farmer’s augmented reality application based on semantic web, in: 2017 South Eastern European Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), IEEE, 2017, pp. 1–6.
27
Journal Pre-proof
pro of
[105] A. Katsaros, E. Keramopoulos, M. Salampasis, Cultivation optimization using augmented reality., in: 8th International Conference on Information & Communication Technologies in Agriculture, Food and Environment (HAICTA), 2017, pp. 805–811. [106] C.-H. Lin, Y. Chung, B.-Y. Chou, H.-Y. Chen, C.-Y. Tsai, A novel campus navigation app with augmented reality and deep learning, in: 2018 IEEE International Conference on Applied System Invention (ICASI), IEEE, 2018, pp. 1075–1077. [107] D. G. Lowe, Object recognition from local scale-invariant features, in: Proceedings of the International Conference on Computer Vision (ICCV), Vol. 2, 1999, pp. 1150–1157.
re-
[108] R. Wang, H. Lu, J. Xiao, Y. Li, Q. Qiu, The design of an augmented reality system for urban search and rescue, in: 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR), IEEE, 2018, pp. 267–272. [109] R. Mur-Artal, J. D. Tard´ os, Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras, IEEE Transactions on Robotics 33 (5) (2017) 1255–1262.
lP
[110] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, A. Y. Ng, Ros: an open-source robot operating system, in: ICRA workshop on open source software, 2009. [111] S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taix´e, D. Cremers, L. Van Gool, One-shot video object segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 221–230.
urn a
[112] H. Subakti, J.-R. Jiang, Indoor augmented reality using deep learning for industry 4.0 smart factories, in: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Vol. 2, IEEE, 2018, pp. 63–68. [113] S. A. Boyer, SCADA: supervisory control and data acquisition, International Society of Automation, 2009.
Jo
[114] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., Tensorflow: Large-scale machine learning on heterogeneous distributed systems, arXiv:1603.04467 (2016). [115] J.-F. Lalonde, Deep learning for augmented reality, in: 2018 17th Workshop on Information Optics (WIO), IEEE, 2018, pp. 1–3.
28
Journal Pre-proof
pro of
[116] J. Aliprantis, E. Kalatha, M. Konstantakis, K. Michalakis, G. Caridakis, Linked open data as universal markers for mobile augmented reality applications in cultural heritage, in: Digital Cultural Heritage, Springer, 2018, pp. 79–90. [117] Z. Li, E. Butler, K. Li, A. Lu, S. Ji, S. Zhang, Large-scale exploration of neuronal morphologies using deep learning and augmented reality, Neuroinformatics 16 (3-4) (2018) 339–349. [118] M. Englert, M. Klomann, K. Weber, P. Grimm, Y. Jung, Enhancing the ar experience with machine learning services, in: The 24th International Conference on 3D Web Technology, ACM, 2019, pp. 1–9.
re-
[119] C. D. Flores-Flores, J. L. S´anchez-Cervantes, L. Rodr´ıguez-Mazahua, L. O. Colombo-Mendoza, A. Rodr´ıguez-Gonz´alez, Arlod: Augmented reality mobile application integrating information obtained from the linked open drug data, in: Current Trends in Semantic Web Technologies: Theory and Practice, Springer, 2019, pp. 269–292. [120] V. Reynolds, M. Hausenblas, A. Polleres, M. Hauswirth, V. Hegde, Exploiting linked open data for mobile augmented reality, in: W3c workshop: augmented reality on the web, Vol. 1, June, 2010.
lP
[121] S. Vert, R. Vasiu, Integrating linked data in mobile augmented reality applications, in: International Conference on Information and Software Technologies, Springer, 2014, pp. 324–333.
urn a
[122] P. Makkonen, G. Lampropoulos, K. Siakas, Security and privacy issues and concerns about the use of social networking services, in: E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, Association for the Advancement of Computing in Education (AACE), 2019, pp. 457–466. [123] S. Vert, R. Vasiu, Relevant aspects for the integration of linked data in mobile augmented reality applications for tourism, in: International Conference on Information and Software Technologies, Springer, 2014, pp. 334–345. [124] P. Shvaiko, J. Euzenat, Ontology matching: state of the art and future challenges, IEEE Transactions on knowledge and data engineering 25 (1) (2011) 158–176.
Jo
[125] K. Kim, M. Billinghurst, G. Bruder, H. B.-L. Duh, G. F. Welch, Revisiting trends in augmented reality research: A review of the 2nd decade of ismar (2008–2017), IEEE transactions on visualization and computer graphics 24 (11) (2018) 2947–2962.
[126] I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, 2016.
29
Journal Pre-proof
pro of
[127] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9. [128] V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE transactions on pattern analysis and machine intelligence 39 (12) (2017) 2481–2495. [129] A. Kendall, M. Grimes, R. Cipolla, Posenet: A convolutional network for real-time 6-dof camera relocalization, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 2938–2946. [130] A. Kendall, R. Cipolla, Modelling uncertainty in deep learning for camera relocalization, in: 2016 IEEE international conference on Robotics and Automation (ICRA), IEEE, 2016, pp. 4762–4769.
re-
[131] J. Wu, L. Ma, X. Hu, Delving deeper into convolutional neural networks for camera relocalization, in: 2017 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2017, pp. 5644–5651.
lP
[132] J. Kim, Y. Jeong, M. Stengel, K. Ak¸sit, R. Albert, B. Boudaoud, T. Greer, J. Kim, W. Lopes, Z. Majercik, et al., Foveated ar: dynamically-foveated augmented reality display, ACM Transactions on Graphics (TOG) 38 (4) (2019) 99.
Jo
urn a
[133] B. Guenter, M. Finch, S. Drucker, D. Tan, J. Snyder, Foveated 3d graphics, ACM Transactions on Graphics (TOG) 31 (6) (2012) 164.
30
Journal Pre-proof
Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Jo
urn a
lP
re-
pro of
☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: