Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies

Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies

Knowledge-Based Systems xxx (xxxx) xxx Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/k...

1MB Sizes 0 Downloads 14 Views

Knowledge-Based Systems xxx (xxxx) xxx

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies✩ Silvio Domingos Cardoso, Marcos Da Silveira, Cédric Pruski



LIST, Luxembourg Institute of Science and Technology, 5, avenue des Hauts-Fourneaux, L-4362 Esch-sur-Alzette, Luxembourg

article

info

Article history: Received 19 August 2019 Received in revised form 8 January 2020 Accepted 10 January 2020 Available online xxxx Keywords: Knowledge graphs Ontology evolution Biomedical ontology Versioning

a b s t r a c t With the advances of Artificial Intelligence, the need for annotated data increases. However, the quality of these annotations can be impacted by the evolution of domain knowledge since the relations between successive versions of ontologies are rarely described and the history of concepts is not kept at the ontology level. As a consequence, using datasets annotated at different times becomes a real challenge for data- and knowledge-intensive systems. This work presents a way to address this problem. We introduce a Historical Knowledge Graph (HKG), where information from previous versions of an ontology can be found inside a single graph, reducing storage space (no need for versioning) and data treatment time (no need for laborious analysis of each version of the ontology). The HKG proposed in this work represents the evolutionary aspects of the knowledge in a structural way. Examples of the applicability of an HKG for information retrieval and the maintenance of semantic annotations show the capability of our approach for improving the quality of existing techniques. © 2020 Published by Elsevier B.V.

1. Introduction Ontologies, and more recently, knowledge graphs (KG) [1] play a key role in ensuring semantic interoperability between information systems. They are part of a chain that intends to transform isolated data into shareable and comprehensible pieces of knowledge [2]. Once created, concepts of ontologies can be linked to data in order to generate annotated digital resources (i.e., semantic annotations), making the semantics of the data explicit for humans and machines. Tools for information retrieval, data fusion, data integration, etc. [3–5] can profit from the semantic annotations (metadata) to better select and disambiguate the analyzed information. Linked Data is an example of metadata where links are used to connect pieces of information and create a vast network of information [6]. In the medical domain, electronic health records are often annotated with terms from the ICD (International Classification of Diseases) or SNOMED-CT (Systematized Nomenclature of Medicine-Clinical Terms) to make it clear to which disease or clinical procedure is the document refers. Another example of the utilization of ontologies is observed in the NLM (National Library of Medicine) browser where all their ✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys. 2020.105508. ∗ Corresponding author. E-mail addresses: [email protected] (S.D. Cardoso), [email protected] (M. Da Silveira), [email protected] (C. Pruski).

catalogs of scientific publications are indexed with terms from MeSH (Medical Subject Heading1 ), and can be accessed via its web browser. However, the dynamic nature of the knowledge pushes ontologists to constantly revise the content of the ontologies in order to improve their quality and adjust them to take into account the new findings [7]. For instance, new findings can lead the content (description) of a concept being modified, and made more specific. One possibility to improve the ontology would be to split the concept, i.e., to replace the original concept with two (or more) new concepts. Another possibility would be to add the new concepts as sub-concepts of (with a hierarchical link to) the original one. The consequences of each change on dependent artifacts are potentially different and a detailed analysis needs to be done in order to evaluate the consequences and implement the necessary changes. Automatizing this process is a challenging task and our work contributes to meeting this challenge. In our previous work [8], we addressed several problems related to the maintenance of mappings between ontologies. In this paper, we will focus on tracking changes in ontologies and on making this information available to any system that needs to cope with the side effects of these changes. Common problems reported by users of ontologies are limited access to the information about ontology changes (change logs) or the format in which this information is published, which is not always machine-interpretable, increasing the possibility of errors when parsing. The frequency and the format with which change logs are published varies 1 https://www.nlm.nih.gov/mesh/meshhome.html

https://doi.org/10.1016/j.knosys.2020.105508 0950-7051/© 2020 Published by Elsevier B.V.

Please cite this article as: S.D. Cardoso, M. Da Silveira and C. Pruski, Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies, Knowledge-Based Systems (2020) 105508, https://doi.org/10.1016/j.knosys.2020.105508.

2

S.D. Cardoso, M. Da Silveira and C. Pruski / Knowledge-Based Systems xxx (xxxx) xxx

according to the publisher [9]. In the biomedical domain, there are initiatives to disseminate best practice guidelines to build ontologies [10] and create repositories of different versions of ontologies [11]. However, the maintenance of artifacts depending on the ontology have still not been sufficiently studied. From previous studies [12,13], we observed that, on average, 10% of the content of large ontologies from the biomedical domain changes every year. The impact of these changes on semantic annotations was evaluated in [14,15]. This analysis provides us with a rough view of what would happen with annotated data if no curation actions were implemented. Projecting the yearly change rate and the impact on the annotations over the next six years, we estimate that more than 30% of the semantic annotations will become outdated (or lost). Despite the effort of data owners to deal with this problem, the manual curation of impacted data is not a viable long-term solution by virtue of the size of the datasets. For instance, SNOMED-CT has more than 300 000 concepts and is used by hundreds of hospitals world wide to annotate millions of electronic health records. Based on these observations, this paper proposes the historical knowledge graph (HKG) as a way to regroup different versions of an ontology into one single knowledge graph. This is the first work that uses a graph to aggregate information about the validity of concepts as well as about the types of changes applied to each concept (from simple changes like addition and deletion to more complex ones like splitting concepts or moving them from one region of the ontology to another). Traditional ways of storing ontology versions use two separate files, one to publish snapshots of the dynamic ontology (versions) and the other to publish the change logs. The HKG was designed to replace both files into one single file with all information integrated in a structured format that is interpretable by both humans and machines. Another originality of our approach is that both snapshots and change logs can be automatically built at any time using only the information coming from the HKG. We evaluate our approach empirically using realistic use-cases applied to the biomedical domain. The quantity and quality of ontologies from the biomedical domain allow us to evaluate our approach over a significant period (ten years) and into several ontologies, reducing bias that could potentially impact the evaluation process. The goal of building the HKG is to make the implementation of some tasks easier, especially those requiring access to several versions of the ontology. These include tools for mapping maintenance, temporal information retrieval, indexing historical datasets, and the maintenance of semantic annotations. In summary, the contributions of this paper are:

• to propose a novel format to store multiple versions of an ontology that reduces the storage space and integrates snapshots of ontologies with the change logs. • to apply this method to four large and heterogeneous ontologies widely used in the biomedical domain. • to identify and characterize changes in these ontologies and generate an HKG for each of them. • to evaluate the use of the HKG on three use-cases that were designed to support information retrieval and annotation maintenance tasks. This paper is structured as follows: Section 2 presents related work. Section 3 contains the description of the historical knowledge graph that we propose, mainly its conceptualization, formalization and representation. Section 4 describes the three use-cases implemented to evaluate our knowledge graph. It includes an experimental evaluation and a discussion of the results. Section 5 concludes the article and outlines future work.

2. Related work The significant quantity of ontologies in the biomedical domain is an indicator of the importance of word meanings when information needs to be shared. Different representation formats are adopted, impacting the expressivity capacity of the ontology. Best practices to build biomedical ontologies were provided by the OBO foundry [10] and led to the implementation of a centralized repository for biomedical ontologies (and their versions). UMLS (Unified Medical Language System [16]) was developed to connect ontologies. The idea was to aggregate vocabularies from different sources, distinguishing their respective versions with a meta-thesaurus to maximize coverage of the domain. More recent initiatives include the BioPortal [11] and AgroPortal [17]. They offer a set of services to publish and exploit biomedical (BioPortal) and agronomic (AgroPortal) ontologies. All these tools and approaches aim to manage different versions of an ontology but consider these versions independently of each other. An integrated view that allows navigation through different versions when analyzing the concepts of an ontology still is an open issue. To address this problem, we propose a version unification method that presents all information concerning the different versions of a given ontology in the same model, called an historical knowledge graph (HKG). In the literature we found works that address part of the problem that we are interested in. Related approaches recently published on this subject aim to optimize the at least one of the three following criteria: 1. Reduction of storage space for graph snapshots, 2. Effective search for temporal patterns in the graph, 3. Improved indexing of dynamic information. One of the first methods for resolving the problem of storage space for evolving graphs was proposed by Salzberg and Tsotras [18]. They introduced the Copy and Log approach in order to highlight the difference between storing a snapshot of the graph whenever a change is implemented (Copy) and storing the first version of the graph and all change logs (Log). Caro et al. [19] propose several strategies to compress the contents of a temporal graph by introducing auto-indexed data structures. In [20], they are interested in representing the ontology versions with tables in a relational database. Labouseur et al. [21] proposed using a parallel graph database (G∗ ), which takes advantage of the commonalities found between snapshots to reduce the total space cost. Their approach aims to save space with elements that do not change over time. In these approaches, the type of evolution or the measures used to compute the evolution are not preserved in the database, preventing any evaluation or interpretation of the evolution of concepts. Recently, Le et al. [22] have proposed a frequent subgraph algorithm on a weighted large graph which is based on two effective strategies to mining weighted subgraphs. The proposed algorithm allow reducing both the processing time and storage space needed. Approaches dealing with the second problem focus on aspects related to the search for patterns in evolving graphs. The authors of [23] propose enriching the graph with the notion of a validity interval, representing the period in which the properties of graph elements apply. The objective of [24] is to optimize the discovery of temporal patterns in the graph by proposing an efficient representation of this graph in order to reduce the search space. In this approach, the authors are interested in identifying links between two nodes in one extract of the graph, before checking whether these links are true for the other extracts. The difficulty of applying these methods to determine how a concept has evolved lies in the way the graph versions are built and analyzed. The temporal patterns are identified within one version

Please cite this article as: S.D. Cardoso, M. Da Silveira and C. Pruski, Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies, Knowledge-Based Systems (2020) 105508, https://doi.org/10.1016/j.knosys.2020.105508.

S.D. Cardoso, M. Da Silveira and C. Pruski / Knowledge-Based Systems xxx (xxxx) xxx

each time and there are no evolutionary links between versions of the same concept. Finally, in the third category, we find Khurana and Deshpande, who propose DeltaGraph [25] with a tree-like index structure that can perform singlepoint and multipoint graph retrieval. The authors extended their approach with the Temporal Graph Index (TGI) [26], which added mechanisms to enable vertex-centric operations (i.e., operations on the modified vertex). DeltaGraph uses edges to represent events occurring between two snapshots of the ontology. These events are limited to the addition or deletion of elements. Snapshots are taken in the same frequency and are compressed to become a vertex of the DeltaGraph. The sub-graphs of the ontology affected by an event become a new vertex of DeltaGraph. There is no one-to-one relation between vertex and edges in DeltaGraph and in the original ontology. Extra processing time is required to convert DeltaGraph into a graph that can be easily explored by information retrieval tools. Different from DeltaGraph, we propose a one-to-one translation format that respects the existing hierarchical properties of the ontologies, but adds more information about their evolution. The approaches described in this section partially address the problem of the representation of dynamic KGs. Indeed, each of the analyzed approach has at least one of the following limitations: (i) the relation between the different versions of an ontology is not specified; (ii) the existing graphs do not allow representing the knowledge contained in the ontology; (iii) the algorithms exploiting these graphs are not scalable; (iv) complex events (e.g., move or split of concepts) cannot be represented. In the following section, we describe our proposal to counter with these shortcomings and position our approach also as an alternative to optimize the storage space of the snapshots of the graphs.

Fig. 1. Metamodel used to represent the history of a concept of an ontology.

3.1. Conceptualization and formalization of the historical knowledge graph In this section, we will introduce the notion of the historical knowledge graph (HKG), where the objective is to keep track of changes applied to an ontology over a period of time. In our work, we consider a concept as a composition of attributes, where the attributes are strings describing the concept, regrouped as identifiers, labels, synonyms, comments, etc. This kind of representation is often adopted in the biomedical domain, where we evaluate our methods. For this work, we transformed a biomedical ontology into a historical knowledge graph (HKG), where the vertex (or nodes) represent concepts and the edges represent the semantic relationship between those concepts. In the HKG, the relations between concepts were simplified to hierarchical (i.e., subClassOf or IsA) and evolutionary (linking versions of concepts) ones. An ontology (O) is defined as a set of concepts (C ), relations (R), and attributes (A): O = (C , R, A) The resulting HKG is defined as a set of vertex V and a set of edges E:

3. Creating historical knowledge graph

HKG = (V , E)

Graphs provide a natural format to represent knowledge and the interactions between pieces of knowledge, for example, to define connections within social networks, collaborative networks, cinema (actors with films), scientific publications, biomedicine (diseases and symptoms and treatments), etc. Industries from many domains are transforming their databases into knowledge bases and they often adopt the graph format to save or share them. For instance, Google [1] has the Google knowledge Graph that organizes information collected when users search the Web with content from other sources; the BBC is providing access to its knowledge graph,2 where users can search for news that has been published; and Thomson-Reuters created a detailed knowledge graph describing companies.3 The main advantage of using ontologies and knowledge graphs is their ability to represent complex relationships between entities while relying on solid mathematical foundations. They also provide the possibility of extending the content of a KG by connecting it with other existing KGs (e.g., via mappings). The increasing adoption of KGs to represent the knowledge of a domain is improving the interoperability between systems, making exchanged information understandable for all (with no ambiguity). However, creating and maintaining a well-formulated KG requires substantial human effort. The focus of this research work is on extending the KG format in order to trace the historical changes of ontologies, providing an alternative way of dealing with the impact of ontology evolution on the dependent artifacts (like mappings and semantic annotations).

Where

2 https://www.bbc.co.uk/ontologies 3 https://developers.thomsonreuters.com/tr-knowledge-graph

3

{ (c , pv, at)

V =

c ∈ C, pv = (s, e), s, e ∈ N, 0 ≤ s ≤ e at ∈ A

} (1)

is the set of vertex of the graph, and

{ E=

(u, v, rel)

u, v ∈ V , rel ∈ {hierarchical, ev olutionary}

} (2)

is the set of edges. The set of vertex of the graph contains the set of concepts c from the ontology O; a period of validity pv for each concept, composed of the start date s, indicating when the concept was first added to the HKG; and the end date e, indicating when the concept evolved to a new version or was deleted from the ontology. Note that deleting a concept from the ontology does not delete it from the HKG, instead it just changes the value of e. According to our definition, concepts are associated with attributes at and the attributes values (e.g., the values of synonyms, comments, etc.) describe the concept c. The edges indicate the relation rel between two HKG vertex: u and v . rel has two types: hierarchical or evolutionary. There is no period of validity for edges because it is static information. When a new version of a concept is added to HKG, all edges of the previous version of the concept are duplicated and adapted to take into account the vertex that changed. An edge can also be deleted in the ontology. In this case, it is characterized as a move operation in HKG. The meta-model used to represent HKG elements is given in Fig. 1. An example of consequences of graph versioning is illustrated in Fig. 2. The evolution of the concept Spinal Muscular Atrophy

Please cite this article as: S.D. Cardoso, M. Da Silveira and C. Pruski, Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies, Knowledge-Based Systems (2020) 105508, https://doi.org/10.1016/j.knosys.2020.105508.

4

S.D. Cardoso, M. Da Silveira and C. Pruski / Knowledge-Based Systems xxx (xxxx) xxx

from MeSH is presented. In this figure, we consider the period from 2009 to 2013. The changes in the concept are in dark blue, the unchanged part is in light blue. To improve the visibility of the figure, we intentionally hid the hierarchical relations. Observe that a new concept (ID:D055534) was added in 2010. The analysis of the changes in the attributes indicates that it is a potential case of Split. Thus, an Evolution relation is created. Note that in 2013, the ID of this concept was changed (new ID:D020966), so a new Evolution relation was created. If this concept was used to annotate documents during the indicated period, then the retrieval of an annotated document may be compromised. For instance, for a user query for Spinal and Bulbar Muscular Atrophy in 2014, only documents annotated after 2010 will be retrieved (before 2010, these documents were annotated with the term Spinal Muscular Atrophy). Instead, if the query uses the concept code D020966, then only documents from 2013 will be retrieved. To retrieve documents from 2009, the user will be required to write a complex query with conjunctive/disjunctive operators and filters. Tracing the history of each concept with an HKG allows navigating through the graph and discovering relations that were implicit when considering versions. Fig. 3 shows how the history of the concept Spinal Muscular Atrophy is represented using an HKG. The upper part illustrates the representation of the versions of a concept. The lower part shows the equivalent HKG. 3.2. Construction of the historical knowledge graph The challenge here is to well characterize the evolution of concepts. In a previous work [27], we detail the characterization process. In brief, first the differences between the different versions of the ontology is computed with Conto-Diff [28]. As input, we provide two versions of one ontology and the expected output will be the set of the following changes: delC (deleted concept), addC (added concept), split (decomposition of a concept into two or more), move (the ID of at least one parent changed), chgAttValue (change attribute value), delA (delete attribute) and addA (add attribute). All these types of changes will be grouped into the Evolutionary relation when creating the HKG. Grouping the changes avoids having several evolutionary relations between two versions of a concept. We observed in our experiments that the evolution of a concept is often associated with many changes (e.g., addA, delA, move, etc.). In cases where detailed information about the changes is necessary, it is possible to add sub-relations, but these cases are not addressed in the paper. In Algorithm 1, line 5, the function computeDIFF computes the difference between two versions of an ontology and the outcome is saved in a database (refereed as Diffi ). In a second step, the content of the database is used to establish the links between the versions. The first instance of the HKG will contain only the first version of the ontology. The function firstHKG_Generator (line 4 of Algorithm 1) translates the current ontology into the HKG (cf. Fig. 1). As explained before, we use the concept code and the version number to create the HKG concept ID. For instance, in the first version of the HKG, Fig. 3, the concept Spinal Muscular Atrophy from MeSH ontology 2009AA (UMLS) will have the ID D009134_2009. The attributes added to the HKG will be the label and the synonyms of this concept. We repeat this process for all concepts that were impacted by the changes. Then, we look for relationships. At this time, we are only interested in Hierarchical relationships, i.e.subclassOf or is_A relationships. Once the first version of the KG was completed, we started the second step, which uses the changes (df ) calculated by Conto-Diff in order to modify the existing HKG. The function HKG_Generator, line 7 of Algorithm 1, modifies the HKG. For instance, if a concept is

Algorithm 1: Building an HKG from several versions of an ontology

1 2 3 4 5 6 7 8 9 10

Input: [ KGs: KGi ]; [ versions ID: Vi ] Output: Historical Knowledge Graph: HKG HKG ← ∅ i←0 forall KGi ∈ KG do if i == 0 then HKG ← firstHKG_Generator(KGi , Vi ) i++ else Diffi ← computeDIFF (KGi−1 , KGi ) forall df ∈ Diffi do HKG ← HKG_Generator(df ) return HKG

deleted, then the end_date is set to the date of the considered ontology version (e.g., D009134_2009 has end_date = 2010). If there are new concepts, they will be added in the same way as in the initial version of the HKG (the start_date will be set to the ontology version date, e.g., 2010). When attributes are modified, we first execute the delete function for the current concept (i.e., set the end_date value) and then create a copy of the concept with the updated attributes (and set the start_date value). These two versions of the concept are linked with the Evolutionary relation. For other complex changes like Split or Move, we do the same (i.e., the delete function is executed to the current version of the concept and the add function is executed for the new one(s) with updated attributes and the new Hierarchical relations, the Evolutionary relation(s) is also added to HKG). The expected outcome will be similar to the graph in the down part of Fig. 3. 4. Use cases for exploiting the historical knowledge graph In our context, we have designed our historical knowledge graph to support two knowledge-intensive tasks. The first one deals with information retrieval as described in [29] and the second one is devoted to the maintenance of semantic annotations [30]. In this section, we detailed the various use cases implementing the HKG, the associated experimental methodology and the datasets used. 4.1. Description of the use cases In many information retrieval applications, documents are retrieved based on the keywords that best characterize their content. This is why the selection of the keywords and their combination in the query are important. Consider for instance a set of medical publications collected over several decades. There is a gap between the keywords used to index the publications (coming from the different versions of an ontology) and the ones used to generate the query (coming from the last version of the ontology). For instance, the term ‘‘cancer’’ may be used to annotated a document in 2001, however, in 2016 this term was replaced by ‘‘malignant neoplasm’’ in the ontology. Thus, in order to build a query, users must be aware of the terminological evolution of the word ‘‘cancer’’ to retrieve all the documents dealing with this particular disease and collected from 2001 to 2019, i.e., the query must have both terms ‘‘cancer’’ and ‘‘malignant neoplasm’’. Our HKG can be used to easier create complete queries in the described context. A service can be implemented over HKG to give all terms that must be added to the query. The efficacy of this kind of service is evaluated by our first use-case.

Please cite this article as: S.D. Cardoso, M. Da Silveira and C. Pruski, Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies, Knowledge-Based Systems (2020) 105508, https://doi.org/10.1016/j.knosys.2020.105508.

S.D. Cardoso, M. Da Silveira and C. Pruski / Knowledge-Based Systems xxx (xxxx) xxx

5

Fig. 2. Example of evolution of the concept Spinal Muscular Atrophy in MeSH. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 3. Building an historical knowledge graph from several versions of the MeSH concept Spinal Muscular Atrophy. The solid arrows represent the hierarchical relationship, the dotted arrows represent the evolutionary relationship.

We have collected medical documents annotated with MeSH terms at different moments in time (2009 and 2016) and created our reference database. We used a different version of MeSH to create one set of queries. For instance, we used queries with terms from 2016 to search for documents annotated with terms from 2009. For each term used in the queries, we looked at the HKG to identify its evolutionary path. Then, we created a second set of queries using all terms from the evolutionary path. Finally, we compared the results of applying each set of query to the database. The second use case was designed to test whether the HKG can support the maintenance of semantic annotations. The goal is to evaluate whether the capacity to automatically adapt annotations is impacted by the evolution of the ontology. The evaluation strategy takes the initial term in the HKG and follows its evolutionary path; the term at the end of the path is proposed as a replacement candidate. Then, we compare the candidates with the terms actually adopted by the domain expert to measure the precision of our approach. Note that the evaluation period is from 2014 to 2016. Fig. 4 shows the evolution of the annotation ‘‘spinal muscular atrophy’’ to ‘‘spinal and bulbar muscular atrophy’’ induced by the specialization of the concept D009134 (MeSH 2009AA) into the concept D020966 (MeSH 2016AA). The annotation model illustrated in this figure has the following format: annotations are represented as Nodes with a set of data properties composed of the ‘‘Label’’ (preferred term), ‘‘vers’’ (version), ‘‘code’’ (the concept’s code in the ontology), and ‘‘rel’’ (we consider in this example that rel is always of the ‘‘Equivalent’’ type and points to the part of the document that the annotation refers to).

4.2. Datasets Four different datasets were used to assess the validity of our knowledge graph. They are of different natures, since they were used to test different use cases. 4.2.1. Terminologies Our method is applied to 10 consecutive ontology versions of four ontologies in order to build four historical knowledge graphs. We used: Medical Subject Headings (MeSH), NCI thesaurus, SNOMED CT, and International Classification of Disease 9-CM versions 2005AA to 2016AA (excluding the AB versions), extracted from UMLS and transformed by ourselves into OWL files. To compute the difference between the terminologies, we use the COnto-Diff application [28]. 4.2.2. Reference datasets Since no annotation baseline exists representing annotated EHRs in different periods, we had to build our own corpus of reference. Fig. 5 illustrates the methodology we followed: 1. We first selected the Electronic Health Records (EHR) from the TREC Clinical Decision Support Track version 2014.4 In this corpus, each row has one domain-specialist opinion indicating whether a document is related to an EHR. The value 0 indicates no relation, while 2 means a strong relation. In our dataset, we utilized all rows that had a 4 http://www.trec-cds.org/2014.html

Please cite this article as: S.D. Cardoso, M. Da Silveira and C. Pruski, Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies, Knowledge-Based Systems (2020) 105508, https://doi.org/10.1016/j.knosys.2020.105508.

6

S.D. Cardoso, M. Da Silveira and C. Pruski / Knowledge-Based Systems xxx (xxxx) xxx

Fig. 4. Example of semantic annotation adaptation.

2.

3.

4.

5.

value equal to two in order to avoid narrow or non-related documents. The document identifier present in the TREC corpus is associated with the documents available on the PubMed Central (PMC) repository, i.e., using this identifier we can retrieve a document from the PMC website, e.g., the document 1180830.5 These documents represent the subjects (drug descriptions, disease definitions, etc.) related to the EHR, but no metadata are associated with it. Thus, we had to use a converter,6 which translated the PMC identifier to a PMID identifier. This PMID identifier allows us to retrieve the annotations from MEDLINE/PubMed Baseline. The MEDLINE/PubMed Baseline7 is a dataset of annotations related to PubMed papers. These annotations are generated each year and represent a static view of the data each time a baseline is released. In other words, all documents are reannotated every year with the latest version of MeSH.8 In our corpus, we use the annotations from versions 2014 and 2016, gathered by associating the PMID identifiers from previous phases with the MEDLINE baseline. After gathering all annotations, we built a knowledge base (KB). The KB is composed of the following data [Concept, EHRs, Years], which indicate which concept is related to which EHR in a specific year. We used it to simulate the environment where the query is different from the annotated EHRs. Using the annotations collected, we built a set of reference queries to evaluate our method. We first grouped together a set of documents from MEDLINE annotations and calculate the difference between their annotations. To illustrate what this difference looks like, in Fig. 6 we included an extract of our dataset. It contains the annotation [Raloxifene:D020849] created in 2014 and referring to document 2544368 (EHR 29). The computed difference shows that in the second year (2016), there was a change in the attribute value and the annotation became [Raloxifene Hydrochloride:D020849]. The common annotations correspond to the data present in both years. Since we were searching for differences, the common annotations were discarded from our queries. To generate our queries, we used the term (from the annotation diff), the year to interact with in our KG, the year to retrieve the documents in the KB from the previous phase and the EHR IDs. For instance, at the bottom of Fig. 5, we elaborated a query with the term Sleep Disorders, which will start interacting with the KG in version 2016 and retrieve the associated EHRs [14,17,23] from the simulated KB in version 2014.

5 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1180830/ 6 https://www.ncbi.nlm.nih.gov/pmc/pmctopmid/ 7 https://mbr.nlm.nih.gov/ 8 Observe that if a single descriptor disappears in a newer version of the MESH, documents described by that descriptor does not become unrecoverable

The dataset is available at https://git.list.lu/ELISA/AnnotationD ataset. It contains a total of 250 queries to evaluate the scenario. The other dataset that has been used refers to the use case maintenance of semantic annotation. We use a subset of the annotations that were produced in our previous work [14]. We randomly selected 500 annotations generated with the 2010AA version of the four terminologies mentioned (around 125 annotations from each) and asked three experts to manually validate/correct the evolution of the 500 selected annotations, according to the 2016AA version of the corresponding terminology. Each expert validated one-third of the annotations without discussing them with the others. The consolidated outcomes constitute our second set of reference, which can be downloaded from https://git.list.lu/ELISA/AnnotationDataset. 4.3. Experimental assessment In this section we will describe the methods applied to evaluate the impact of an HKG on three use cases. Regarding use case 1 (for information retrieval), we wanted to answer the following question: Which documents are related to ‘‘this’’ specific term?. To avoid bias in the query, we defined a simple query format that searches for exact matches of the term in the metadata. The baseline query has the terms provided by the user, while the extended query has a disjunction of the terms provided by an HKG. In this analysis, we were not evaluating whether we could find the right document (precision of the answers) because all resulting documents have the searched term (precision is 100%), instead we were interested in knowing whether any documents were missing (recall of the answers). Thus, we measured the recall of both searches. We did this analysis for two situations:

• Forward: for querying documents stored in 2016 (annotated with terms from MeSH version 2016), using terms from MeSH version 2014 in the queries (167 queries) and, • Backward: for querying documents stored in 2014, using terms from MeSH version 2016 in the queries (83 queries).

Algorithm 2 implements the ‘‘Forward’’ method. It shows how we searched for the history of a concept (its evolutionary path) starting from a moment in time prior to the final concept (e.g. from 2009 to 2016). This algorithm can easily be adapted to implement the ‘‘Backward’’ method and find the inverse path (i.e. from 2016 to 2009). The inputs are: the graph (HKG), the concept from the query (c) and a vector with a start and end date (years) indicating the validity period. The algorithm begins by evaluating whether the concept c is valid until the end data. If c exists until the end date of the search period years then the path is completed and the concept is returned. If not, the algorithm tests if this is the root of the graph and returns null if so (line 1–4). These are the stop conditions of our recursive algorithm. While the path is not complete, the algorithm goes

Please cite this article as: S.D. Cardoso, M. Da Silveira and C. Pruski, Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies, Knowledge-Based Systems (2020) 105508, https://doi.org/10.1016/j.knosys.2020.105508.

S.D. Cardoso, M. Da Silveira and C. Pruski / Knowledge-Based Systems xxx (xxxx) xxx

7

Fig. 5. Methodology to build the corpus of reference of annotated EHRs.

Fig. 6. Example of the differences computed between terms from MEDLINE annotations; the first year refers to 2014 and the second year to 2016.

Algorithm 2: EvolutionPath(Concept c, String year) (Traversing the HKG to extract the history of a concept)

1 2 3 4 5 6 7 8 9

Input: HKG : G; init_term: c ∈ G; period: years; Output: evolutionary path: [c ] ⊂ G if isValid(c , years) then return c else if c == Thing then return null forall directE v olution(c) do nextc ← getDirectE v olution(c) tmp ← E v olutionPath(nextc , years) if tmp == null then return null

14

forall directSuperClass(c) do sc ← getDirectSuperClass(c) tmp ← E v olutionPath(sc , years) if tmp == null then return null

15

return null

10 11 12 13

through the HKG recursively using evolutionary relations linked to c (line 5–9). If no such relationships exist, the algorithm selects the next concept (nextc ) linked through a hierarchical relationship (line 10–14) and start again to search for a evolutionary path.

The type of hierarchical relationship used depends on the search method (forward or backward). Since we are advancing in time (forward), we always select the more abstract concept (sc) when we pass through a hierarchical relation (line 10–11), but if we were regressing in time (backward), we would use the more specialized concept. This simplified procedure was adopted after observing that using classical search algorithms like A* takes too long to find a path, considering the size of the KGs that we are analyzing. Fig. 7 presents the recall for information retrieval analysis. The results show that for a period of five years and randomly selecting terms that evolved, using HKG allows improving the recall of the search in 2% for the forward search and 0.5% for the backward search, indicating a potential added value of the HKG for the information retrieval case. Regarding the second use case, two sets of experiments were conducted. The questions for this use case are: Is there a path in the HKG that validates the evolution of a concept? and, can we automatically determine whether a semantic annotation will evolve and which concept will be used at that moment in time? To answer the first question, we need to evaluate the connective property of our HKG, i.e., whether two concepts, used to annotate the same text at two different moments in time, are somehow connected in the HKG. Algorithm 2 was implement with this purpose. The input of the algorithm are the two terms used to annotate the same piece of text in our dataset (one term used in 2014 and the other in 2016). The output is one evolutionary path between them (when it exists). The recall ranges

Please cite this article as: S.D. Cardoso, M. Da Silveira and C. Pruski, Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies, Knowledge-Based Systems (2020) 105508, https://doi.org/10.1016/j.knosys.2020.105508.

8

S.D. Cardoso, M. Da Silveira and C. Pruski / Knowledge-Based Systems xxx (xxxx) xxx

Fig. 7. Use case 1, Recall of the information retrieval. Baseline consists in a simple match query.

Fig. 9. Precision for the second experiments of the second use case.

to reduce the searching delay, but the gain in execution time impacts on the precision of results. Finding a good balance between both criteria is necessary, for that the size and complexity of the ontologies must be taken into account. We expect to improve the precision of semantic annotation maintenance by combining an HKG with the direct maintenance method described in [31]. The latter method uses external sources of information like Bioportal9 to maintain semantic annotations. Although promising, this method face the problem of availability of external sources, i.e. portals like Bioportal do not exist for all domains or do not have all versions of the ontologies. Fig. 8. Recall obtained when searching for evolutionary paths between two terms used to annotate the same document, but at different periods of time.

from 19% to 56%. One can note that the recall depends on the size and complexity of the KG. The simplifications that we made in the search algorithm are the only reason for low recall (cf. Section 4.4). Fig. 8 shows the results we obtained. On average, in 38% of cases we found a path in an acceptable time (<1 s). To answer the second question, we implemented a set of experiments that aimed to evaluate the ability of the HKG to provide the right information to maintain semantic annotation. This is a typical case where the user knows the old version of the concept and wants to know the newest one. The inputs of our algorithm are: the label of the concept used to annotate a document, and the year of annotation (i.e. 2014). The algorithm will follow an evolutionary path and renders the label of the concept found in 2016 (if there is no evolutionary relation in an HKG, the algorithm gives back the same concept label). We compared the outcome of the algorithm with the evolved annotation provided by domain experts. Fig. 9 shows the value of precision, which ranges from 26% to 70%. The precision obtained varied from one terminology to another. The worst results were obtained for NCIt (26%). We explain those results by the fact that NCIt contains several identical attribute values for different concepts. For instance, ‘‘Salt’’ is the label of the concept with the code ‘‘C822’’ but also of concept with the code ‘‘C29974’’. This ambiguity can be a source of errors during the construction of the dataset since wrong initial concepts (generated by the annotation tools) could have been shown to experts, requiring corrections that are not caused by the ontology evolution. Another aspect that impacted the precision is the way in which we implemented our search algorithm (cf. Algorithm 2). The adopted heuristic simplify the path search process and avoid going through all possible paths, which reduces the possibility of finding the correct evolved concept. We implement this heuristic

4.4. Discussion Besides providing information about the evolution of ontologies, we have observed during our experiments that our HKG also saves storage space (cf. Table 1). Although this is understood intuitively by analyzing Fig. 3, Table 1 contains the computed statistics about the four HKG we have built for our assessments. In this table, we show in the first line the total amount of storage space required if the ten versions of each ontology (SNOMED-CT, NCIt, MeSH and ICD9-CM) are saved independently. For instance, the ten versions of MeSH use 229 MB of memory. The second line shows the size of the file when the ten versions are transformed into HKG format. The percentage of space saved using our method is shown in the third line, where we apply the equation: 1 − HKG/All, i.e. one minus the second line of the table divided by the first line. For instance, the memory used for storing MeSH was reduced by 76% (1−54/229). Similarly to the storage space, we observed that the total number of ontology classes had a significant reduction. In Table 1, we added a column with the total number of classes for all versions of each ontology (first line), as well as the number of individual Node types for the HKG (second line). Note that individual Node types in the HKG are equivalent to classes in the other ontologies. The gain is significant, e.g. MeSH reduced the number of classes from 309 850 to 30 749 (90%). This is close to the total number of classes of one version of MeSH e.g. MeSH 2016 has 27 968 classes. Reducing the total amount of entities will impact positivelly on the search algorithm because it will limit the search space. The evaluation of this aspect will be part of future work, thus this positive impact remains a hypothesis based on our intuition. The quality of the results we obtained for the various use cases depends on the properties of the HKG itself or the way we use its content. We adopted several hypotheses that simplified the experiments and the construction of the HKG. The first hypothesis considers that changes affecting annotations involve 9 http://bioportal.bioontology.org

Please cite this article as: S.D. Cardoso, M. Da Silveira and C. Pruski, Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies, Knowledge-Based Systems (2020) 105508, https://doi.org/10.1016/j.knosys.2020.105508.

S.D. Cardoso, M. Da Silveira and C. Pruski / Knowledge-Based Systems xxx (xxxx) xxx Table 1 Line one shows the size of the OWL file (in MB) and the total number of classes for the ten versions of four different datasets: SNOMED CT, NCIt, MeSH, ICD-9CM. The size of the file and the number of individuals (of type node) for each HKG is presented in line 2. The storage saving rate (All/HKG) and the classes number rate is shown in line 3.

All HKG Rate

SNOMED-CT

NCIt

Size

Class

Size

Class

MeSH Size

Class

ICD9-CM Size

Class

1517 700 54%

3 714 877 893 983 76%

275 121 56%

844 747 207 594 76%

229 54 76%

309 850 30 749 90%

88 37 58%

261 194 44 089 83%

only concepts in the neighborhood i.e. the concept used for the annotation, its super-concepts, its sub-concepts and its siblingconcepts. This hypothesis comes from [12], where the authors evaluated the changes in biomedical ontologies. However, moving one concept’s attribute to another concept outside of its neighborhood is considered by our algorithms as not affecting the annotation. Attributes are important elements when selecting concepts during the annotation process. This first hypothesis reduces the expected quality of the outcomes obtained, but saves time by avoiding searching the whole graph for consequences of each change in the attributes of concepts. In the second hypothesis, we assume that there is a one-to-one map between the UMLS CUI10 code and the code used internally by each ontology. In reality, this is not always the case. We simplified our algorithm to randomly choose one code among the potential candidates. This choice also impacted the expected quality of our outcomes. The third hypothesis directly affects the outcomes of use case 3. We implement the first fit method, where the algorithm stops to search for concept candidates when it found the first path. In reality, as shown in Fig. 2, several evolutionary paths are possible when there are split concepts or a merge of concepts. Not considering all paths impacts on the expected quality of the outcomes. Regarding information retrieval (use case 1), using an HKG shows an improvement of the search recall. As this use case was designed for physicians interested in retrieving documents in patient records, the objective was to retrieve all relevant patient information, i.e. improve the recall. Note that since we used the exact match criteria to filter the results, the precision of the algorithm is always 100%. In fact, the additional keywords added to the initial query in a disjunctive manner ensured that the documents retrieved contain at least one keyword of the query in their metadata. The recall could be improved if the information extracted from the HKG about evolution was used to generate more complex queries. The enrichment of queries used to search for documents over time can also be enhanced by the content of the HKG. To this end, the refinement of the evolutionary relationships into more precise ones, specifying the kind of evolution that affects concept, will make the content of the HKG richer. In this case, additional keywords could be selected to enrich the query depending on the type of the evolutionary relation that links the versions of the concept over time. For instance, a split or merge of a concept means that the affected element becomes either more general or more specific, thus impacting the search (i.e. increasing the level of abstraction favor recall over precision). In addition, the specialization of the evolutionary relation will allow the design of different strategies to go through the graph. Since we favored recall, we had gone through the HKG only according to the relationships that made the concepts more 10 https://www.nlm.nih.gov/research/umls/new_users/online_learning/Meta_ 005.html

9

abstract. On the contrary, if precision is more important, the selection of appropriate terms from the HKG would have been done following the evolutionary relation that turns general concepts into more precise ones over time (cf. Fig. 2 showing the evolution of the concept ‘‘Spinal Muscular atrophy’’ in MeSH from 2009 to 2010). In use case 2, we search for a path that links two concepts. In current experiments, we are analyzing how these paths can be used to evaluate the quality of annotated texts. For instance, one of our initial hypotheses was that an evolution of annotation occurs when there are changes in the ontology that are close to (or within) the concept used to annotate the text. So, if the annotations are not updated, can we estimate how much they deprecate? if the path do not pass by an evolutionary relation, can we estimate that the original annotation was wrong? We are still searching for patterns that can answer these questions. An HKGs can also support the direct maintenance of the semantic annotations. As showed in case 3, it is possible to determine the evolution of the annotations by observing how the ontology evolves. In our current work, we are evaluating the association of machine learning and our HKG to check whether the precision of our results can be improved. In this case, we also consider different types of changes and search from patterns that can explain how the annotation evolved. 5. Conclusion Representing and exploiting ontology evolution is still an open research question. In this paper, we have proposed a method to construct an historical knowledge graph that contains information about a given ontology. We have shown the added value of such a graph on different use cases such as medical information retrieval and maintenance of semantic annotations. We are currently working on the algorithm that computes the difference between two ontologies, as well as on the characterization of the evolution. We expect to solve some of the limitations of use case 2. We are testing methods to detect probabilistic evolutionary relations that will allow us to create evolutionary relations between versions of concepts that are not in the same region of the ontology. Future works include the adoption of machine-learning techniques to predict evolutionary relations; the improvement of the HKG by considering the type of evolutionary relationship making its content richer and providing an additional strategy to exploit the graph, and the definition of strategies to measure the quality of the semantic annotations after the evolution of the ontology used to create the annotations. Author Contributions All authors have contributed equally to this work. Acknowledgment This work was funded by the Luxembourg National Research Fund (FNR) through the ELISA project [grant number C14/IS/824 9162]. References [1] A. Singhal, Introducing the knowledge graph: Things, not strings, 2012, https://googleblog.blogspot.co.at/2012/05/introducing-knowledge-graphthings-not.html, (Accessed Sept 2018). [2] H. Wang, Y. Yang, B. Liu, H. Fujita, A study of graph-based system for multi-view clustering, Knowl.-Based Syst. 163 (2019) 1009–1019. [3] R.V. Guha, D. Brickley, S. Macbeth, Schema.Org: Evolution of structured data on the web, Commun. ACM 59 (2) (2016) 44–51, URL http://doi.acm. org/10.1145/2844544.

Please cite this article as: S.D. Cardoso, M. Da Silveira and C. Pruski, Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies, Knowledge-Based Systems (2020) 105508, https://doi.org/10.1016/j.knosys.2020.105508.

10

S.D. Cardoso, M. Da Silveira and C. Pruski / Knowledge-Based Systems xxx (xxxx) xxx

[4] M. Esposito, E. Damiano, A. Minutolo, G. De Pietro, H. Fujita, Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering, Inform. Sci. 514 (2020) 88–105. [5] A. Ben Abacha, M. Da Silveira, C. Pruski, Medical ontology validation through question answering, in: Conference on Artificial Intelligence in Medicine in Europe, Springer, 2013, pp. 196–205. [6] C. Bizer, T. Heath, T. Berners-Lee, Linked data: The story so far, in: Semantic Services, Interoperability and Web Applications: Emerging Concepts, IGI Global, 2011, pp. 205–227. [7] D. Dinh, J.C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Identifying relevant concept attributes to support mapping maintenance under ontology evolution, J. Web Semant. 29 (2014) 53–66, Life Science and e-Science, URL http://www.sciencedirect.com/science/article/pii/ S1570826814000444. [8] J.C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, DyKOSMap: A framework for mapping adaptation between biomedical knowledge organization systems, J. Biomed. Inform. 55 (2015) 153–173. [9] A. Groß, C. Pruski, E. Rahm, Evolution of biomedical ontologies and mappings: Overview of recent approaches, Comput. Struct. Biotechnol. J. 14 (2016) 333–340. [10] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L.J. Goldberg, K. Eilbeck, A. Ireland, C.J. Mungall, et al., The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration, Nature Biotechnol. 25 (11) (2007) 1251. [11] N.F. Noy, N.H. Shah, P.L. Whetzel, B. Dai, M. Dorf, N. Griffith, C. Jonquet, D.L. Rubin, M.-A. Storey, C.G. Chute, et al., BioPortal: ontologies and integrated data resources at the click of a mouse, Nucleic Acids Res. 37 (Suppl. 2) (2009) W170–W173. [12] J.C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Understanding semantic mapping evolution by observing changes in biomedical ontologies, J. Biomed. Inform. 47 (2014) 71–82. [13] J.C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Characterizing semantic mappings adaptation via biomedical KOS evolution: A case study investigating SNOMED CT and ICD, in: AMIA Annual Symposium Proceedings, Vol. 2013, 2013, p. 333. [14] S.D. Cardoso, C. Pruski, M. Da Silveira, Y.-C. Lin, A. Groß, E. Rahm, C. Reynaud-Delaître, Leveraging the impact of ontology evolution on semantic annotations, in: European Knowledge Acquisition Workshop, Springer, 2016, pp. 68–82. [15] G. Wade, S.T. Rosenbloom, The impact of SNOMED CT revisions on a mapped interface terminology: Terminology development and implementation issues, J. Biomed. Inform. 42 (3) (2009) 490–493, Auditing of terminologies, URL http://www.sciencedirect.com/science/article/pii/ S1532046409000446. [16] O. Bodenreider, The unified medical language system (UMLS): integrating biomedical terminology, Nucleic Acids Res. 32 (Suppl. 1) (2004) D267–D270.

[17] C. Jonquet, A. Toulet, E. Arnaud, S. Aubin, E.D. Yeumo, V. Emonet, J. Graybeal, M.-A. Laporte, M.A. Musen, V. Pesce, et al., AgroPortal: A vocabulary and ontology repository for agronomy, Comput. Electron. Agric. 144 (2018) 126–143. [18] B. Salzberg, V.J. Tsotras, Comparison of access methods for time-evolving data, ACM Comput. Surv. 31 (2) (1999) 158–221. [19] D. Caro, M.A. Rodríguez, N.R. Brisaboa, Data structures for temporal graphs based on compact sequence representations, Inf. Syst. 51 (C) (2015) 1–26. [20] T. Kirsten, M. Hartung, A. Groß, E. Rahm, Efficient management of biomedical ontology versions, in: R. Meersman, P. Herrero, T. Dillon (Eds.), On the Move to Meaningful Internet Systems: OTM 2009 Workshops, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009, pp. 574–583. [21] A.G. Labouseur, J. Birnbaum, P.W. Olsen Jr., S.R. Spillane, J. Vijayan, J.H. Hwang, W.-S. Han, The G* graph database: Efficiently managing large distributed dynamic graphs, Distrib. Parallel Databases 33 (4) (2015) 479–514. [22] N.-T. Le, B. Vo, L.B. Nguyen, H. Fujita, B. Le, Mining weighted subgraphs in a single large graph, Inform. Sci. (2019). [23] V.Z. Moffitt, J. Stoyanovich, Towards sequenced semantics for evolving graphs, in: EDBT, 2017, pp. 446–449. [24] K. Semertzidis, E. Pitoura, Durable graph pattern queries on historical graphs, in: 2016 IEEE 32nd International Conference on Data Engineering, ICDE, 2016, pp. 541–552. [25] U. Khurana., A. Deshpande, Efficient snapshot retrieval over historical graph data, in: 2013 IEEE 29th International Conference on Data Engineering, ICDE, 2013, pp. 997–1008. [26] U. Khurana, A. Deshpande, Storing and analyzing historical graph data at scale, 9th International Conference on Extending Database Technology, 2016, pp. 65–76. [27] S.D. Cardoso, C. Pruski, M. Da Silveira, Supporting biomedical ontology evolution by identifying outdated concepts and the required type of change, J. Biomed. Inform. 87 (2018) 1–11, URL http://www.sciencedirect. com/science/article/pii/S1532046418301680. [28] M. Hartung, A. Groß, E. Rahm, Conto–diff: generation of complex evolution mappings for life science ontologies, J. Biomed. Inform. 46 (1) (2013) 15–32. [29] C. Pruski, N. Guelfi, C. Reynaud, Adaptive ontology-based web information retrieval: The target framework, Int. J. Web Portals 3 (3) (2011) 41–58. [30] S.D. Cardoso, C. Reynaud-Delaître, M. Da Silveira, C. Pruski, Combining rules, background knowledge and change patterns to maintain semantic annotations, in: AMIA Annual Symposium Proceedings, Vol. 2017, American Medical Informatics Association, 2017, p. 505. [31] S. Cardoso, C. Reynaud-Delaître, M. Da Silveira, Y.-C. Lin, A. Gross, E. Rahm, C. Pruski, Evolving semantic annotations through multiple versions of controlled medical terminologies, Health Technol. 8 (5) (2018) 361–376.

Please cite this article as: S.D. Cardoso, M. Da Silveira and C. Pruski, Construction and exploitation of an historical knowledge graph to deal with the evolution of ontologies, Knowledge-Based Systems (2020) 105508, https://doi.org/10.1016/j.knosys.2020.105508.