A methodology for the semi-automatic creation of data-driven detailed business ontologies

A methodology for the semi-automatic creation of data-driven detailed business ontologies

ARTICLE IN PRESS Information Systems 35 (2010) 758–773 Contents lists available at ScienceDirect Information Systems journal homepage: www.elsevier...

1MB Sizes 2 Downloads 12 Views

ARTICLE IN PRESS Information Systems 35 (2010) 758–773

Contents lists available at ScienceDirect

Information Systems journal homepage: www.elsevier.com/locate/infosys

A methodology for the semi-automatic creation of data-driven detailed business ontologies Antonio Paredes-Moreno a, Francisco J. Martı´nez-Lo´pez b,, David G. Schwartz c a b c

Department of Financial Economics and Operations Management, Business Faculty, University of Seville, Sevilla, Spain Department of Marketing, Business Faculty, University of Granada, Granada, Spain Information Systems Division, Graduate School of Business Administration Bar-Ilan University, Israel

a r t i c l e i n f o

abstract

Article history: Received 29 May 2009 Accepted 14 March 2010 Recommended by: F. Carino Jr.

In the context of technological expansion and development, companies feel the need to renew and optimize their information systems as they search for the best way to manage knowledge. Business ontologies within the semantic web are an excellent tool for managing knowledge within this space. The proposal in this article consists of a methodology for integrating information in companies. The application of this methodology results in the creation of a specific business ontology capable of semantic interoperability. The resulting ontology, developed from the information system of specific companies, represents the fundamental business concepts, thus making it a highly appropriate information integration tool. Its level of semantic expressivity improves on that of its own sources, and its solidity and consistency are guaranteed by means of checking by current reasoning tools. An ontology created in this way could drive the renewal processes of companies’ information systems. A comparison is also made with a number of well-known business ontologies, and similarities and differences are drawn, highlighting the difficulty in aligning general ontologies to specific ones, such as the one we present. & 2010 Elsevier B.V. All rights reserved.

Keywords: e-Business Knowledge integration Enterprise ontologies Semantic web Interoperability eCommerce Semantic of integration information Semantic of data integration

1. Introduction With the first phase of Web connectivity infrastructure consolidation over, electronic commerce is now immersed in a process of vast expansion. Many companies seek to justify investment in e-commerce, and they demand improved information integration processes. The need for process optimization is urgent. More information is required, and more rapidly, to gain better quality knowledge. That is, we are in a phase that requires closer integration of business information systems to make better use of the enormous quantities of data, by means of an improved and more accurate interpretation of those data. Undoubtedly, this will bring a stark improvement in

 Corresponding author. Tel: + 34 958242350.

E-mail address: [email protected] (F.J. Martı´nez-Lo´pez). 0306-4379/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.is.2010.03.002

companies’ knowledge generation capabilities ([1]). The work we will present in this paper consolidates years of experience in and around commercial enterprises, with a focus on databases, as well as on the analysis of how companies in different branches of industry use the Internet (design, electronic commerce, production, electronic administration, integration with clients, suppliers, employees, etc). In recent years, two facts or situations of great importance have become clear. One, companies increasingly want their information systems to perform better (e.g.: [2]). Two, these systems are prone to error, redundancies and semantic failures, which result in lower quality knowledge (e.g.: [3]). Company data are often scattered over different areas, formats and systems. Such data must be managed by means of processes that are more centralized and sophisticated to exploit their information more effectively and profitably ([4]). Therefore, a

ARTICLE IN PRESS A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

transition is needed towards systems that are more efficient in their knowledge management and which can better integrate data originating in multiple places. Many companies that previously used commercially available database management systems as standalone software applications have come to manage their information resources via their local networks or the Internet by means of shared environment Intranets, some even sharing information with other companies (associates, clients, suppliers) through Extranets. Today, companies want to grow towards more powerful database management systems able to manage information from the Web, within centralized environments and with the need to integrate along their supply chains as much as possible ([5,6]). It is also known that companies see information as a highly valuable asset. To keep this information from its various sources fresh and integrated is of the utmost importance today. Therefore, applications created especially for working in these environments are necessary, as is a data vision that is new and integrative. Likewise, companies need to optimize the internal management processes of their resources, which normally leads to changes in the structures of information and the applications used. This involves making databases, tables, attributes, restrictions, etc., compatible. The need for the integration and reorganization of data sources is also evident in the case of the adoption of solutions developed by third parties, for example, SAP1 or Navision2 ([7]). The main contribution of this paper is a methodology for the development of an ontology from a set of company databases to integrate information sources for the companies and to contribute to the logical treatment and strengthening of current databases. The resulting ontology meets both commercial and managerial needs. An ontology produced in this way has a series of characteristics that make it highly appropriate for solving current problems of homogenization and integration revealed by the Semantic Web initiative. The ontology contributes solidity to the renewal processes through which the company modernizes its information systems at the same time that it integrates its various sources into an information model that is coherent, consistent and shared with the other associated companies and with clients and suppliers. The paper is structured as follows. First, the article presents a background of the Semantic Web from a variety of perspectives, and the semantic interoperability needed for the integration of information between systems, as well as the role of ontology in this process. Then, Section 3 explains our methodology for constructing a business ontology based on existing database systems, and shows an example of a complete ontology developed following our method. Section 4 compares the resulting ontology with other well-known business ontologies and

1 SAP AG, a global software company whose headquarters are in Walldorf, Germany; see /http://www.sap.com/index.epxS 2 Microsoft Dynamics NAV, a business management solution for small and medium-sized organizations; see /http://www.microsoft.com/dynamics/nav/default.mspxS

759

discusses the strengths and weaknesses of our methodology. Finally, present directions for future research and conclusions are presented.

2. Background 2.1. Starting considerations Today’s business seeks to optimize information resources by making the most of new technologies and languages rooted in HTML (e.g.: XML, eXtensible Markup Language; RDF, Resource Description Framework; and OWL, Ontology Web Language) to extract the most refined and accurate information possible from its systems. This information then acts as the basis for correct and effective decision-making, with minimal risk. Alongside the new technological environment, we see the unstoppable process of globalization; in business terms, this process is of vital importance. Companies undergo association and absorption processes, businesses jump across national and continental barriers. All these, together with greater profit demands, makes them very aware of the added value of the correct integration of information and its sharing with partners, shareholders, clients and suppliers [5,8]. The recent and promising research field of ontology is now of vital importance in terms of the business problems discussed above. Philosophical ontology informs a vital approach to philosophical inquiry by addressing the metaphysical aspects of the nature of existence, including meanings, relationships and instances of the abstract, the concrete, the general, and the specific. It describes the basic categories and relationships of existing beings or entities. These existing entities or beings encompass all objects, people, concepts, and ideas. In an applied sense, a business ontology models what we know about a domain or part of reality by means of concepts and relationships. The ontologies are represented by classes (categories), properties (roles) and class attributes. For a deeper understanding of the term ontology, and a general sense of the ontological aspect of information technology, the reader is referred to [9]. [10] Defines ontology as: ‘‘an explicit specification of conceptualization’’. Borst et al. [11] extend that definition as ‘‘a formal specification of a shared conceptualization’’. In addition to the original definition, two important notions are used – formal and shared. Formal – indicates that ontology should be specified in a standard way in order to be comprehended by a machine. Shared – denotes that the conceptualization is in consensus rather than relating to an individual scenario. Our goal is to produce a formal shared ontology. An operational definition of a formal shared ontology refers to a document or file that contains formal definitions of the concepts and relationships between them in a specific knowledge domain which can then be applied in a computational setting. A formal shared ontology contains a set of classes (relational taxonomy of concepts and roles) and a group of axioms that enable new knowledge to be deduced.

ARTICLE IN PRESS 760

A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

2.2. Semantic interoperability: the new challenge for firms’ integration of information

2.3. Previous projects on information integration – A short review

The time when a company’s ability to compete was based exclusively on its individual strengths is over [12]. On the contrary, sectors are now made up of value constellations that compete and cooperate among each other in the form of coopetition [13]. Understanding between companies that collaborate within the same specific value network is critical, as its non-existence would lead to competitive inefficiency (see [14]). The diversity of economic agents in the global business environment extends beyond value systems and is increasingly interconnected thanks to Internet and the generalized use of e-business applications. This means that firms must find a way to work with Information Systems that use languages that are compatible, homogenous and universal in meaning. This involves resolving interoperability between systems and, specifically, semantic interoperability (see [5]). In fact today, the search for systems’ semantic interoperability is so crucial as to be a key aspect of investigation and technical challenge (see [15]). The problems of interoperability are not new, neither for systems technicians nor for users. Totally solving interoperability involves crossing three dimensions of lesser to greater complexity: technical, syntactical and semantic. Today, technical interoperability associated to the systems’ capacity to exchange signals is solved. The same cannot be said of the two other dimensions, especially the semantic. The existence of syntactic interoperability supposes the capability of diverse systems or software components to interpret the syntax of data with the same form. This interoperability definitively enables cooperation between systems even when they differ in language, interface and execution platform [16]; i.e. syntactic compatibility between systems. Nevertheless, as we have pointed out, complete interoperability will not exist until the semantics are guaranteed; that is, the scenario in which semantic meanings are shared for each terminology used [17]. Currently, semantic heterogeneity is broad, which assumes that considerable effort is needed to integrate information. So, the development of business environments that enable the semantic management of information is a necessity [5,7,18,19]. Diverse types of approaches are evident with regard to the semantic information integration (see, as e.g., for more detail: [20]): (1) the mapping-based approach. This is based on correspondence or mapping from local data sources onto a global diagram (see: [21,22]); (2) the intermediary-based approach, based on the use of intermediary mechanisms (mediators, agents, ontologies, etc.) (see: [23–25]), and (3) the query-oriented approach, based on interoperable languages (SQL3), logic-based languages (see: [26,27]). As is detailed in Section 3, we pose a hybrid approach that, although based mainly on the use of ontologies, is particularly interesting for the resolution of the problem of semantic interoperability (see, as e.g.: [28–30])

Many studies investigate the integration, mixing and articulation of information through ontologies. What follows is a brief description of the most important. The projects whose methodology most closely fits the work in this paper are those of Fensel et al. [31] and Schwartz and Schreiber [32]. This work documents the Corporate Ontology Grid (COG) project undertaken by Fiat (Italy), Unicorn (Israel) and LogicDIS (Greece), with the Institute of Computational Sciences of the University of Innsbruck (Austria) acting as consultant. The COG project investigated the problem of semantic heterogeneity among the data sources of the Fiat automobile company, and how it could be overcome by integrating these sources using a central information model, that is, an ontology. The problems of integration were solved by the application of the information semantic via a Unicorn Workbench tool. This tool was used to create an ontology based on schemas collected from the sources under study. These schemes were mapped onto an information model (ontology) to make the meaning of the concepts explicit and to relate them, thus creating an information architecture that made for a unified vision of the organization’s data sources. The methodology presented here, although it necessarily operates in the same environment and problem area, differs markedly and achieves superior results. Given the aforementioned proximity to COG, we carry out a more detailed treatment than the rest of the projects reviewed in this section. Although the detailed presentation of our methodology comes in Section 3, certain questions are anticipated. In what follows, we comment on some fundamental differences. Essentially, our approach describes a real case based on the information systems of companies that trade in varied energy-related products. The data sources we study are all MSAccess formatted, but we apply the information semantic as follows: Phase 1: We thoroughly analyse the metadata or schemas of the databases as well as the studies undertaken by technicians and experts of the company/companies in question. With the knowledge acquired, we set a partial map of concepts onto a newly created ontology. In this phase, we generate the hierarchy of classes, properties and restrictions from the semantic structure taken from the schemas and company rules regarding mapped sources. We call it partial mapping due to the fact that we had to reinterpret many of the schemas to give the ontology a logical and well organized structure. Phase 2: Contrary to the COG project, we fill the ontology with real database requests. We use our own ad-hoc created tool (GOWL). This tool consists fundamentally of an SQL query manager that produces code in the OWL format, containing the individuals as well as the relationships with other individuals via its properties and restrictions. During the construction and refining process, the ontology was checked by using logical reasoners in order to obtain a coherent and consistent information model.4

3

SQL: Structured Query Language.

4

For a deeper description, see Paredes-Moreno [67].

ARTICLE IN PRESS

Academic prototype, Commercial enterprises Prototype vehicle, in energy products manufacturing and transportation Academic prototype, Academic prototype, HPKB project and Bibliographical data Network data several test projects services Academic prototype EDEN

Languages supported by Prote´ge´, Import of RDBMS n/a Any language supported by Ontolingua IDL,XML Schema, custom wrappers Import: Java custom wrappers templates fdor JDBC, text, etc, interoperation with OKBC

Experience in projects

Semi-automatic Semi-automatic

Degree of automation Inter-operability

Import of RDBMS, custom wrappers XML, Schema, Cobol, copybook, custom wrappers (limited) export of DAML+OIL COG project and Academic prototype. several pilot projects A few test projects

Semi-automatic Semi-automatic Semi-automatic Semi-automatic

One-to-one special Class, property, restrictions, instances, rules, axioms, constraints Clustering Class, property; Clustering – Class, property, – restrictions, queries? Instances Single-shared n/a Single-shared Class, property, restrictions

Single-shared Class, property, value transformation, conditional mapping Manual

One-to-one Class, property, value transformation, conditional mapping Manual

New, developed ad-hoc Combination Merging Combination Combination Combination

Integration paradigm Mapping pattern Mapping types supported

Combination

Aligning

KRAFT OBSERVER InfoSleuth MOMIS COG Criteria

Table 1 Similarities and differences of our methodology regarding others – adapted and extended from [31], including Trading.owl.

Chimaera

ONION

Ours

A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

761

MOMIS (Mediator environment for Multiple Information Sources) [33], developed at the universities of Mo´dena, Reggio Emilia and Milano, is a framework project that pursues the extraction and integration of information from structured and semi-structured sources. InfoSleuth [34] is a multi-agent system for semantic interoperability on heterogonous data sources. It supports the construction of complex ontologies from small ontologies. OBSERVER [35] is defined as a system based on ontologies perfected by relationships for the resolution of the heterogeneity of vocabulary. KRAFT (Knowledge Reuse & Fusion/Transformation), [36] was set up by the universities of Aberdeen, Cardiff and Liverpool in collaboration with British Telecom. Its main proposal is to research the possibility of sharing and reusing information contained in databases and heterogeneous knowledge systems. Chimaera [37,38] is a Web-based tool used to mix and diagnose ontologies. It was developed by Stanford University’s Knowledge Systems Laboratory (KSL). ONION (ONtology CompositION), [19,39] is a system centred on a secure formalism that allows the support of a working framework capable of scaling the interoperability of ontologies. ONION uses rules that cover the semantic voids by creating an articulation between systems. In Table 1, we show differences and similarities between our methodology and those mentioned previously with regard to the following comparative criteria: the integration paradigm, which indicates whether it is mixing, aligning or combining ontologies (for example, having a central ontology while the source ontologies are preserved and mapped by the central ontology); a mapping pattern, the correspondence between data sources and the ontology (one-to-one, single-shared, or ontology clustering), and depending on the paradigm chosen; mapping types supported, the different types of mapping that the methodology supports: classes, properties (relationships), queries (individuals), axioms, restrictions, logic rules, etc.; interoperability, concerning above all the import/export languages supported by the methodologies; and finally, the experience acquired via the use of the tools in different projects gives us an index of their usability and utility. Generally, there is a difference in maturity (and, consequently, quality) between the tools used in the industry and those developed as academic prototypes.

3. A new business ontology methodology with semantic inter-operability capabilities The methodology we have developed enables the construction of a business ontology to consolidate business processes, modernize information systems and at the same time integrate varied sources into an information model that is coherent, consistent and can be shared with other companies. It results in a specific ontology that is highly applicable to medium-sized and large commercial companies.

ARTICLE IN PRESS 762

A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

We present a case study of companies that trade energy-related products in their various forms. Through this case we present and test our methodology by deriving the ontology from the companies’ business information systems. These systems contain subsystems for offers, sales, warehousing, invoicing, accounts and treasury. Their sources follow the relational model. The comprehensive information model has tables, queries and sets of logical rules as well as norms and restrictions specific to the companies (see Figs. 2 and 3). In terms of semantic interoperability, our approach applies the information semantic in the following way. In the first phase, we analyse the metadata and schemas of the databases in the case study. With this knowledge, we fix a partial map of concepts to a newly created ontology. This first phase sees the generation of a hierarchy of classes, properties and restrictions out of the semantic structure of the company’s schemas and norms in the mapped sources. The map is partial because it was necessary to reinterpret many schemas in order to provide the ontology with a well-organized logical structure. In addition to mapping from the schema sources, we also drew on the help of experts in the domain during the creation of the taxonomy. In the second phase, we filled the ontology with real requests from the databases. We used our own tool, (GOWL: ‘‘Generator OWL code’’). This tool consists of an SQL query manager that produces OWL format code containing both the data fields (the field of a database table referenced to an entity attribute or database register) and the relationships with other data fields via their properties and restrictions. Our methodology is centred on the three aforementioned approaches – mapping the databases, the construction of a global ontology and the use of logic-based languages in its development. Unlike integration projects based solely on a single approach, this hybrid formula allows the exploitation of the advantages of the approach based on ontologies, offsetting some of their inconveniences with the strengths of other approaches (for a deeper discussion, see: [20]). The ontology incorporates all the characteristics of the conceptual model of the data, both those related to the concepts, or TBox (schemas, integrity restrictions and company rules) and those related to the data fields with their relationships (ABox), as the data are imported from the tables. More specifically, on the one hand, TBox in Description Logic is a ‘‘terminological component’’, a ‘‘Concept definition’’. In other words, it is the definition of a new concept in terms of other previously defined concepts. It is used to describe statements in terms of controlled vocabularies in ontologies (see [40], pp. 17– 19). On the other hand, ABox is an ‘‘assertion component’’. The ABox contains extensional knowledge about the domain of interest, i.e. assertions about individuals (see [40], pp. 19 ss). TBox and ABox statements together form a knowledge base. Likewise, the ontology expands and improves the expressivity of the firms’ databases. This improvement, together with the solidity that the logic provides, converts the ontology into a useful platform and a starting-off

Fig. 1. Steps in the construction of the ontology.

point for the development of future applications to new information models. This (future) process of returning to a new model of information from the ontology will see databases treated and enriched semantically thanks to the fundamental contribution of the ontology. We have used the OWL DL (Ontology Web Language Description Logic), based on descriptive logic (see [40]). Descriptive logics are a family of knowledge representation languages. They have a formal semantic based on the logic of predicates and they represent an excellent jumping-off point for defining languages that can construct ontologies. These logics aid the reasoning tasks needed to support the construction, integration and evolution of ontologies. The ontology has been edited by Prote´ge´5 [41] and Swoop6, and check for consistency by the RacerPro7 and Pellet8 reasoning services. 3.1. Stages of the methodology The methodology9 we present for the development of the ontology has six stages (see Fig. 1). This structure is based on a review practised to the set of methodologies supporting significant projects on integration information (see Section 2.3), although there is a closer relationship, especially in the first four stages, to the methodological process posed by the Semantic Information Management (see [42]), and used in the COG project. These are our methodological stages: 1 Requirements analysis: Shared analysis of requirements. Fixing the range of the project, its starting-off point, aims to be fulfilled and method used for the construction. 5

/http://protege.stanford.edu/S (Consulted 10/07/2008). /http://www.mindswap.org/2004/SWOOP/S (Consulted 10/07/ 2008). 7 /http://www.racer-systems.com/S (Consulted 10/07/2008). 8 /http://www.mindswap.org/2003/pellet/S moved to /http://pellet.owldl.comS 9 For a deeper view, see: Paredes-Moreno [67]. 6

ARTICLE IN PRESS A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

2 Metadata collection: Collection and cataloguing of metadata from an exhaustive analysis of the relevant sources: schemas, restrictions, company rules or norms, procedures, uses or performance modes when faced with the varied circumstances associated to specific practice, etc. 3 Construction: Construction of the ontology which includes the development of a new information model from the existing ones, following these two steps: a Modelling the taxonomy of classes, properties and restrictions (TBox). For instance, in the case study used to present our methodology, Fig. 6 shows ‘‘trading.owl’’, the ontology’s collapsed tree. Its class root is ‘‘Trade Handling’’, from which all other branches or classes derive. b Populating the ontology with class queries (ABox). This is done by means of a tool created ad hoc that makes the process automatic, importing data from the table registers. 4 Refinement: Rationalizing and refining the ontology, extracting the semantic from the sources and mapping it onto the ontology. The steps are: a Checking the taxonomy: to find the ontology’s incoherencies or contradictions via the most upto-date automatic reasoners already mentioned. b Checking and treating of the data items to find axiomatic inconsistencies and make the necessary corrections to achieve consistency. 5 Testing: Comparing the resulting ontology with other relevant ontologies, which in our test case are TOVE, REA, EO, BMO and e3 value that already exists on the Web, highlighting differences, mutual deficiencies and contributions. 6 Feedback: Collection of teaching experiences and selection of the best practices in integration, correction of possible errors and projections on future lines of investigation in order to make the most of our research. By checking the ontology with logical reasoners, we arrive at a solid consistent ontology. The model created can then be used as the basis for new applications. That is to say, as suggested in stage 6, once the methodology is applied, and an ontology is created and checked, it can then be used to create new information systems (i.e. coherent ontologies or databases). The most important aspects of the ontology are: (1) The ontology as repository of business knowledge. That is, we have a specific ontology that describes the commercial concepts used by the companies, both in terms of the classes and the properties or relationships between them. Also described are the restrictions imposed not only from the metadata but also from other data arising from daily use and practice that report what we could call ‘‘the company’s behavioural philosophy’’ and (2) The ontology as basis for new information systems, has all the expressivity of its sources. In addition, it is

763

enriched by the incorporation of company rules and restrictions and, given its basis in logic, we can assure its consistency when used as a basis for modifying and updating information systems. Finally, the methodology we propose for the particular field of application we tackle (i.e. business ontologies) is very coherent with the Design Science Research Methodology (DSRM) for Information Systems Research, recently proposed by Peffers et al. [43]; This work has significant antecedents in [44], March and Smith [45,46]. Due to the potential relevance of this work in becoming a common framework for conducting Design Science Research in Information Systems, we will now make some comparative comments. In [43] there appears a general methodology that is structured in the six following steps or stages: (1) Identify problem and motivate. A specific research problem is defined and the value of the solution is justified. (2) Define objectives of a solution. The aims of the solution are inferred from the problem definition and the knowledge of what is possible and feasible. (3) Design and development. This consists of the creation of the artifact10. Such artifacts can be constructs, models, methods or instantiations. (4) Demonstration. This means demonstrating the use of the artifact in order to resolve one or more cases of the problem. (5) Evaluation. This deals with observation and measurement of how the artifact finds a solution to the problem. (6) Communication. Communicating the problem and its importance, the artifact, its use and novelty, the rigour of its design, and its efficiency for the researchers and other relevant parties. These steps or stages are normally structured in sequential order (1–6), although researchers do not always follow this sequence according to the problem they are trying to solve and the condition of situation of the problem. In what follows, we synthetically describe some of the differences and similarities between The DSRM methodology and the approach we have taken as presented herein. DSRM incorporates the principles, practices and procedures required to make information systems work. It provides a model for investigation in Design Science and a mental model for presenting and evaluating such research. In effect, it is a generic methodology theoretically applicable to any field of investigation. However, the methodology we present in this paper has the analysis of a specific business reality as its starting-off point. In particular, it is a series of research problems within this reality, the most important of which is the integration of information. On the other hand, the purpose of our methodology is the development of a business ontology as an efficient medium for the solution of the problem of information integration as well as a model 10 Artifact is something observed in a scientific investigation or experiment that occurs as a result of the preparative or investigative procedure (Oxford Dictionary)

764

Table 2 A comparison of the Pfeffers et al.’ DSRM and our methodology for business ontology construction. Process iteration 2 -

3 -

4 -

5 -

6

DSRM From [43]

Problem identification and motivation (p. 52). ‘‘Define the specific research problem and justify the value of a solution’’.

Define the objectives for a solution (p. 55). ‘‘Infer the objectives of a solution from the problem definition and knowledge of what is possible and feasible’’.

Design and development (p. 55). ‘‘Create the artifact. Such artifacts are potentially constructs, models, methods, or instantiations (each defined broadly) or new properties of technical, social, and/ or informational resources’’.

Demonstration (p. 55). ‘‘Demonstrate the use of the artifact to solve one or more instances of the problem’’.

Evaluation (p. 56). ‘‘Observe and measure how well the artifact supports a solution to the problem’’.

Communication (p.56). ‘‘Communicate the problem and its importance, the artifact, its utility and novelty, the rigor of its design, and its effectiveness to researchers and other relevant audiences such as practicing professionals, when appropriate’’.

1–2 1–2 1. Requirements analysis: Shared analysis of requirements. Fixing the range of the project, its starting-off point, aims to be fulfilled and method used for the construction. 2. Metadata Collection: Collection and cataloguing of metadata from an exhaustive analysis of the relevant sources.

3 3. Construction of the ontology which includes the development of a new information model.

4 4. Refinement: Rationalizing and refining the ontology, extracting the semantic from the sources and mapping it onto the ontology.

5–6

Process iteration Stage Process sequence Ours

5. Testing: Comparing the resulting ontology with other relevant ontologies 6. Feedback: Collection of teaching experiences and selection of the best practices in integration, correction of possible errors and projections on future lines of investigation

ARTICLE IN PRESS

1 -

A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

Activity Process sequence

ARTICLE IN PRESS A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

for creating new information systems or the renewal of existing ones. All this, as previously discussed, aimed at business management. However, despite the differences in the starting-off point and in the objectives, we can appreciate significant common ground at various stages of both methodologies. We will now comment on some questions related to this issue. Given the generality and, thus, the applicability of the DSRM methodology, our methodology could be a case of DSRM application. In particular, we note that the third stage (design and development vs. construction of the ontology) is practically identical. Indeed, our computer application (GOWL) would be considered as described as a DSRM artifact. The two final stages, (‘‘problem identification and motivation, definition of the objectives for a solution’’ vs. ‘‘requirements analysis and metadata collection’’) undoubtedly specify the first phases of any project: the analysis of a real domain, its problems, and the fixing of the aims of the solution. The second stage of our methodology is very specific, since its starting-off point is real companies, using metadata not only to identify the problems but also to define the solution. Our fourth stage, which deals with the improvement and refining of the ontology, is a process that involves possible repetitions of previous stages. This stage is provided for in the fifth stage of Peffers et al.’s DSRM, when it states that researchers can decide to make repetitions with activity three in order to improve efficiency. Likewise, our methodology proposes stages five and six, which describe the testing of the methodology with other similar methodologies, and the possible feedback based on good practices and external experiences. These stages correspond to the fourth, fifth and sixth stages of our work, as both methodologies provide for the need to demonstrate or test the investigation as well as publishing it so that it can be shared and improved. Therefore, apart from a few minor differences of meaning, both methodologies agree on the fundamentals; the only differences being in perspectives and levels of generality on which both are based. This fact, also, talks about the suitability of our methodology. Finally, Table 2 shows a side-by-side comparison of the stages followed in each methodology. 3.2. Demonstration by empirical application: A case study Our case study is based on the information system of a company involved in the large-scale trade of energy products in their various forms and which belongs to an industrial group that is leader in the field of new technologies. Its trading volume is over 500h million per year. It sells products in telecommunications to manage task developed in the energy industry, energy transport, solar energy and supplies’ management. Fig. 2 illustrates the structure of its commercial information system. All the commercial information make up the universe of discourse, subdivided into domains (semantic classes or tables) in which are defined the relationships whose extension takes in subsets of explicit elemental facts (literal, records, registers, queries) and general laws normally represented by rules of integrity or inference.

765

Fig. 2. The commercial information system.

Fig. 3 shows some of the more important tables. They follow the relational model and are implemented in the databases using MSAccess. 3.3. Modelling the ontology (TBox) The semantic management of information gives us a methodology for modelling of the ontology, based on the semantics of the data sources. We have taken into account the various integration methods of the projects previously mentioned, comparing their methods with the ones applied in our investigation, but with this criterion uppermost: applicability to the integration of data schemas. The main characteristics of our methodology (see Fig. 4) are the following: (1) The integration paradigm consists of the construction of a central ontology (with possible future links to more generic ontologies) starting from an exhaustive analysis of the metadata collected from business databases, and setting up a correspondence between schemas towards the concept hierarchy. (2) The resulting correspondence pattern (mapping) between sources and ontology is very close to oneto-one correspondence. However although the ontology’s concepts are coherent, this is not the case with all its corresponding concepts in the databases. (3) Our methodology supports a partial mapping at class level, and a total mapping at property, individual, axiom and restriction levels. (4) Our methodology’s degree of automation fits between semi-automatic and interactive. The taxonomy’s creation process is manual, with the use of the Prote´ge´ tool, but the integration process of queries is automated by our own GOWL application, in which a series of SQL enquiries integrated into codified events in the application is defined. When these events are executed automatically, they generate OWL code files that are totally compatible and integrable with the ontology in Prote´ge´. (5) The ontology supports semantic interoperability, being implemented by the standard OWL

ARTICLE IN PRESS 766

A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

Fig. 3. Tables and database relationships.

(6) The mapping tool’s visualizing interface is made up of graphic forms in VBasic on Access. The ontology can be seen and edited using various tools, for example, Prote´ge´, Swoop, Eclipse, etc. (7) The maturity of this methodology and its accompanying tools makes it a solid basis for new ontology generation and applications based on those ontologies. It is currently being used for building new business databases on the SQL Server (see Future Research).

Fig. 4. The semantic management of information.

language based on descriptive logic and having past the logical reasonability test (RacerPro, Pellet).

The semantic management of information as reported by our methodology contains the layers shown in Fig. 4. The external data sources’ layer lies within the block at the bottom of the figure. It describes the place and origin of the data and metadata. In our case study, these sources are databases in a relational format (RDDBB). This layer includes correspondences between the ontology’s sources and concepts. The block at the top contains the tools and ontology construction layers. The latter describes the meaning of the data, that is, the information model.

ARTICLE IN PRESS A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

Fig. 5. Taxonomy of trading.owl (upper level).

The ontology built following this methodology is made up of the following elements: 1. The hierarchy of classes. The concepts are set in a hierarchy of classes. The classes, as mentioned before, come from metadata schemas collected from databases. In the ontology, each element (classes, properties, data items) is given its own name that should be as expressive as possible. The names of the subclasses and subproperties have prefixes that indicate their origin immediately above them. The names of data items are imported from the corresponding tables and, to avoid duplication, they are given a prefix indicating the class to which they belong. Fig. 5 shows an example . 2. The properties of the classes. The roles or properties of the classes, that is, their relationships, are taken and organized into a hierarchy. These can be of two types: a. Object property: are those that relate an individual of a class to an individual of another class, or, whose

767

b. Data-type Property: those that relate data items of a class to some type of value; their domains are the classes but their range is a type of fixed value data. The ontology only accepts the creation of object properties. The technical reason is that the automated reasoners do not currently consider this type of property, which makes it impossible to check the ontology. To get around this problem, the values of number and date are represented by a chain format in the ontology (Fig. 6). 3. Company rules and restrictions. Next, the specific conditions imposed on the classes are generated, with the aim of modelling their meaning within the company environment (for example, suppliers’ invoices are to be paid after a certain number of days, or the profit made on each sale cannot be less than a set percentage). The conditions restricting the classes can be either necessary or necessary and sufficient. a. Descriptions and notes. In addition to their formal description, notes or comments can be added to the classes and properties in natural language in order to make their meaning more intelligible.

3.4. Modelling the ontology (ABox) Once the taxonomy and all its elements are created, the entities or data items are inserted and each one matched to its corresponding class with its properties. Then the relationships between data items are generated and the properties that affect them are applied to each one. In order to automate the process of mapping and integration of the sources’ data items, we have constructed a software tool, codified in Vbasic, whose main content is a series of SQL enquiries capable of translating the data in the MsAccess registers into OWL and integrating them into the body of the ontology. The following SQL query extracts each client from the database with the ‘‘have_country’’ attribute and converts them into OWL code

SELECT ‘‘oClientsdf:ID= ‘‘+ ’’’Cli_’+ [NomCli]+ ’’’‘+ ’’4’’+’’ ohave_country4 oCountryrdf:ID= ‘‘+’’’Ct_’+ Clients.CountryCli+’’’‘ +’’/ 4’’+ ’’ o/have_country4’’+ ’’ o/Clients4’’ FROM Clients; domain and range are classes. In the methodology, different types of object property are considered (functional, inverse functional, symmetric and transitive), and these are applied to each property that forms part of the ontology’s hierarchy. For example, the ‘‘have_payment’’ property means: The payment of invoices to the supplier is done by a payment order at a bank or financial company. The domain of the property is ‘‘supplier_invoices’’, and its range is ‘‘payment_order’’.

The result for each client from the database is the following:

oClientsrdf:ID=‘‘Cli_InstalacionesyMontajesS.A.’’ 4 ohave_country4 oCountryrdf:ID= ‘‘Ct_Spain’’/4 ohave_country4 o/Clientes4

ARTICLE IN PRESS 768

A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

Fig. 6. Partial taxonomy of the ontology (agents’ branch).

The ontology constructed with this methodology possesses the characteristics of the domains and subdomains of the sources in the case study. In this context, we could say that ours is a standard ontology as it aims to take in the entire terminology used by companies trading products related to energy, engineering and telecommunications. Therefore, we have taken into account the extensive experience and knowledge inherent in the development of the databases, the applications and their maintenance in working companies. 3.5. Treating the ontology In order to guarantee the logical coherence and solidity of the ontology, our methodology incorporates treatment and repairing processes of possible errors [47]. The RacerPro 1.9 and Pellet 1.5. logical reasoners are used for this in the computational logic field. These carry out a complete check of the class and property hierarchy, and of the data items and their relationships. It is important to point out that the resulting ontology is totally coherent from the logic viewpoint, which means that it is free of

contradictions. This coherence is necessary when recovering information or when serving as a base for new databases or applications. Regarding expressivity, the ontology has all the expressive power of the OWL-DL language based on descriptive logic. It formalizes the data source elements and even takes on new elements drawn from the experience and reflections of experts and users. The level of expressivity it reaches is ‘‘SHOIN(D)’’ in terms of descriptive logic. The descriptive logic expressivity SHOIN(D) of this ontology is: the ‘‘S’’ symbol is the abbreviation for ALC with a transitive role. ALC allows concept intersection, full negation, full universal quantification, full existential quantification and concept disjunction. ‘‘H’’ means the role hierarchy. ‘‘O’’ is for nominals (Singleton sets, One of, for example a. ‘‘I’’ means inverse roles (properties which have inverses specified, or properties that are symmetric)). ‘‘N’’ means number restrictions (cardinality restrictions and functional properties). ‘‘D’’ is for datatypes (see, for an in-depth review: [40]). In sum, this assumes that the ontology has most of the features of descriptive logic, of which we emphasize

ARTICLE IN PRESS A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

its capacity to: formalize concepts, attributes, roles and complements, and structure them in a hierarchy; integrate database queries (individuals); incorporate negative properties and qualified restrictions; and admit various types of data.

4. Related work: A comparative analysis with other business ontologies

769

commerce systems. It has been extended in order to provide concepts of greater use in understanding aspects of processing as well as economic aspects (economic exchanges). For almost 20 years, the REA model has been widely used as an instrument for teaching business students how to design accounting databases [51]. However, few companies use it for systems development practice. Recently, researchers and professionals have renewed their interest in this model for two reasons (see [52])

4.1. An introduction to the most significant business ontologies Following the construction of our ontology, it is a useful and interesting exercise to match it to other existing business ontologies whose domains are similar or close to ours, contrast focus and components, and reveal similarities and differences. We have selected the business world’s most representative ontologies currently available, and we describe the main ideas put forward by their investigators: BMO (Business Management Ontology). The BMO ([48] ontology, or set of ontologies, represents an integrated information model which, along with the design of the business processes, incorporates project management, requirements and performances to make up the fundamentals of a company management knowledge base. Its main users are business analysts, but it has also been taken up by information technology (IT) experts to establish correspondences with related software deficiencies such as business focus and the description of Web services. It enables the definition of the private and public processes of companies, entities and business focus, as well as the services implemented by the processes of activity. It follows the European Union’s UN/CEFACT methodology. The BMO currently contains 40 ontologies that define some 650 classes belonging to various domains within the business environment. BMO characteristics include: (a) linking ontologies to produce a single general vision; (b) multiple ontologies can exist in parallel to supply many organizations or industries. Each ontology specific to an industry integrates and extends the ontologies of the companies’ generic domain ontologies; (c) it associates the company’s fundamental process concepts to generic business concepts such as properties, company rules and documents, etc, and d) it contains technically oriented ontologies like the ontology of business focuses. REA (The ResourceEventAgent). This is a business domain ontology originally created to define and develop accounting systems. Like its data semantic model, the REA business model is based on the Entity-Relationship metamodel but contains additional primitive ontologies, axioms and guidelines associated to the model that help construct and validate the conceptual models of the information systems belonging to the accounting systems. The REA was initially created by [49,50] many to model accounting systems. However, it was found to be very useful and intuitive as it brought a greater understanding of company processes, and became one of the best frameworks both for traditional companies and electronic

(a) The developers of the REA model have signed up to the ISO OpenEDI initiative, UN/CEFACT, OAG, eBTWG. Their participation has resulted in the adoption of the REA model as a business ontology in the UMM [UN/ 03] methodology and the ECIMF system, as indicated in the [53]. (b) The REA model has been proposed as the theoretical foundation for the basic models of ERP (Enterprise Resource Planning) systems.

The architecture of the REA business ontology has three levels or layers representing economic activity, three classes of elements that can be identified in each economic exchange or conversion process: economic resources, economic events and economic agents. Each economic resource is linked to an economic event that causes its entry and exit flows (stock flow). In addition, each economic event that results in a resources entry (a purchase, for example) is necessarily paired to an exit event (a payment), and vice versa (duality). The participative relationship describes the agents involved in an economic event. This simple ontological pattern is the basis of the REA ontology, and it comes from McCarthy’s original model [54]. The economic events are performed within a chain or ordered sequence of actions called business processes. We refer to the architecture on three levels (chain value, business processes and economic events). With advances in technology and specification methods, the applicability of the REA conceptual model has also changed. Works by Geerts and McCarthy [55] have widened its basic framework several times over recent years. The REA changes the direction of the transaction model in two ways: removing it from the traditional vision inherited from the subject matter based on finance and the single company, and directing it towards big business with the new perspective of the modern ERPs and the information systems of the varied types of electronic commerce. With the entry of commercial partners and long-term relationships, there is a need for more reliable and predictable structures in which both parts formalize their exchange contracts. The REA ontology answers this need by adding classes such as economic commitment, economic contract, agreement, etc. The REA establishes correspondences with one of the ontologies from the IEEE Standard Upper Ontology Working Group, called SUMO (Suggested Upper Merged Ontology), which is on a superior level.

ARTICLE IN PRESS 770

A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

EO (Enterprise Ontology): The Enterprise Ontology [56] has been developed as part of the Enterprise project by the University of Edinburgh’s Institute of Artificial Intelligence Applications, together with IBM, Lloyd’s Register, Logica UK Limited and Unilever. The project has been backed by the UK’s Department of Trade and Industry within its Intelligent Systems Integration programme (IED4/8032). Its business modelling aims to achieve a global vision of the organization such that it can be used as a basis for decision-making. It is not a traditional organizational vision but a vision that comprises the subject matter or domain of the organization’s functioning. To achieve this aim, strong flexible tools are needed to support integration and communication. During the EO development process, the aim was to embrace those concepts that were widely accepted in the business world, presenting their definitions in natural language. It begins with basic concepts (activity, relationship, actor, for example). These are used to define the main body of the terms, which is divided into the following areas or sections: activities and processes, organization, strategy and marketing. TOVE (TOronto Virtual Enterprise). The TOVE project aims to develop an integrated set of ontologies for public and private company modelling, with the following features [57]: (a) To provide a terminology that company can share so that each agent can use and understand it. (b) To define the meaning of each (semantic) term as precisely as possible. Implementing the semantic within a set of axioms will give TOVE the ability to automatically deduce most common-sense questions about the company. (c) To define a group of symbols to graphically represent a term or the concept built from it. A series of ontologies have been developed in the TOVE [58] framework that comprises various aspects of the business environment. These ontologies have recently been structured on three levels: (a) Foundation ontologies, which among others include ontologies on activities [59], resources [57], and organization [60]. (b) Derived ontologies, which include among others ontologies on quality management [61], product design [62], and costs [63]. (c) Business ontologies, which include ontologies on company design, flow of materials, projects and business processes. e3_value. With the new information technologies, business models based on e-business have an increasingly wide field of development and their share of all business transactions grows daily. This ontology edges closer to a more rigorous conceptualization of business modelling. The e3_value ontology’s main feature is its basis in value (see [64]). The e3_value ontology consists of a conceptual approach whose objective is the representation of business models. The notion of value, and how the objects are

created, exchanged and consumed in a multi-agent network, is the central theme of this ontology. It enables the representation and analysis of many non-trivial ideas on business domains, even processes and mechanisms that are important to the company such as the causality of income flows, client property, the ability to fix prices and chose alternative agents to deliver objects of value and company business. Based on the notion of value, as a foundation of electronic commerce modelling, the concepts are defined as: Agent, Activity of the Value, Object of the Value, Port of the Value, Interface of the Value, Exchange of the Value and Offer of the Value. The benefits of this ontology oriented towards value are two ([64]): better communication between those who must make decisions on the essential points of the model and the shareholders; a greater and more complete understanding of the operations of electronic commerce and its requirements. 4.2. Comparison To compare our ontology with the others we follow Pateli and Giaglis [65] and Jasper and Uschold [66], providing a fixed comparative framework whose main elements are as follows: (1) Purpose of the ontology. Motives or reasons that justify the existence of the ontology (achieving better communication, interoperability, reliability, specification, representation and knowledge, etc). (2) Focus of the ontology. The focus of attention differs from model to model. Some centre on one company, others look to a variety of companies. Some are based on strategy, others on operational aspects. Some focus specifically on technology, while others turn their attention to innovation or work on both simultaneously. (3) Components of the ontology. This refers to the concepts, relationships or properties, rules and axioms that the ontology uses to represent the business model. (4) Role of the ontology. Whether it contains operational data or is made up of concepts, relationships and axioms in order to contain operational data, or whether it is a language that expresses ontologies on the two previous levels. (5) Ontological representation. The degree of meaning that an ontology represents. This varies greatly from one to another. The simplest ontologies are a simple collection of terms. The meaning grasped by an ontology varies according to the quantity of the data represented and the representation’s degree of formality. According to the quantity of data, there are lightweight ontologies (a limited number of concepts, relationships and axioms) and heavyweight ontologies (a greater number of concepts, relationships and axioms). According to the degree of formality, they move between ‘‘highly informal’’ and ‘‘strictly formal’’. The comparison follows in Table 3.

ARTICLE IN PRESS A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

771

Table 3 A comparison of the ontology Trading.owl with other business ontologies. Purpose of the ontology Trading.owl To improve business model’s representation and shared knowledge. A more specific level, based on the modelling of specific companies in transition

Focus of the ontology

Components of the ontology

Role of the ontology

Maturity of the ontology

A single ontology; focus more specific to a group of companies. It is semantically covered, as a branch of the BMO tree

A taxonomy organized into a tree with nine superclasses or branches with their properties, restrictions and entities.

All the ontologies are made up of concepts, relationships and axioms to have operational data to express their business model.

Not implemented yet in applications. It is the foundation for the transformation of the business model in companies

BMO

To improve business A set of ontologies model’s representation to provide high and shared knowledge. levels of defining processes and objects of companies

REA

Oriented to academic level (to add semantic content to accounting systems’ modelling).

Focus on the ‘‘economic events’’ aggregated that relates resources and agents.

EO

To provide a methodology and software tools for business modelling in a changing environment

TOVE

An integrated set of ontologies whose models are valid for public and private enterprises.

Attends to generic concepts of activity and processes, organization, business strategy and marketing A set of ontologies which presents a theoretical model on the company.

E3_value

To define concepts from the value perspective

Focus on the generic concept of value

Ontological representation

Middleweight or heavyweight ontology, in terms of its metric: named classes 136, nameless classes 178, properties 106, annotations 28, and individuals 13475. Codified in OWL language. It has passed all the current reasoning tests It includes areas of Made up of Used in the academic Lighter ontology interest as valueconcepts, setting and in with its limited related elements, like relationships and consultancy number of concepts, client segments, axioms to have relationships and distribution channels, operational data to axioms. The version relationship express their tested is inconsistent mechanisms, basic business model. resources and capacities, and relationship between these elements. Generic concepts and Made up of Not implemented yet It is a lighter properties related to concepts, in applications. It is ontology, codified in economic relationships and the foundation for RDF. However, links transactions. axioms to have the UMM up with a higheroperational data to methodology level ontology, SUMO express their business model. Defines the concepts Made up of Includes several Most recent version of Activity, concepts, companies, such as consulted is in KSL. Organization, relationships and Lloyd’s Register, Moderate levels of generality and Strategy, Marketing, axioms to have Unilever and IBM maturity. Edition Time, etc. operational data to format: Ontolingua. express their Source code: Lisp business model. Consists of three Made up of Used in the academic Set of ontologies. blocks of ontologies, concepts, setting and in a Codified in Lisp. each one defining the relationships and process of Number of elements terms that are specific axioms to have aggregation of not shown. It to its domain of operational data to ontologies contains no interest from an express their individuals. economic perspective business model. Defines a series of Made up of Used in practices/in A lighter ontology. elements and concepts, the practice/ in work Code not accessible relationships within relationships and experience of concepts of actors, axioms to have entrepreneurial objects, ports and operational data to development in interfaces of value, express their several industries, as activities and business model. well as in academic exchanges of value courses

5. Final remarks 5.1. Limitations and future research The methodology has so far been tested in a single domain that of a commercial energy trading company. While we believe that the results can be generalized across other domains, this still needs to be tested.

While our ontology can be expressed in OWL, it has not yet been integrated with other existing ontologies. Future research should examine the ease of any such integration. Given that the ontology is created from local sources, the quality of the global model depends mainly on the quality of the local model. In our case, the ontology has been refined in two ways: avoiding inclusion of the databases’ errors and inconsistencies; and not adding

ARTICLE IN PRESS 772

A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

concepts that were not found in the databases. Another quality element has been the interaction with experts and users drawing lessons from their work in practice. Thus we can state that the ontology reflects the knowledge of the company. Other directions for future research include:

 The possibilities of alignment or mixing of specific 



ontologies such as the one we have presented with others which are more general. The development of reasoners with greater checking capacity would be of great importance, not only in terms of memory use but above all for their capacity to reason and explain errors. We also believe that it is vitally important to develop projects that investigate and produce software applications capable of using ontologies in a fully automated manner in order to convert their enormous quantity of data into information and productive knowledge.

5.2. Conclusions We have presented a methodology to create business ontologies based on the existing database structures and information systems of an organization. We have applied that methodology to a specific case study to produce a real ontology based on the target organizations information systems. Finally we compared the resulting ontology with other more established business ontologies. This comparison of our case study ontology to those that have preceded it leads us to the following conclusions regarding our methodology. The Trading.owl ontology that was produced using our methodology is applicable to companies belonging to the energy trading sector. It is quite complete for this field as, apart from the metadata, it integrates the entire semantic support of the uses, habits, norms and ways of understanding a specific domain. Using our methodology results in an ontological model that is far better suited to the subject company than the generic models represented by the other leading business ontologies. In ours, concepts exist that could be easily integrated within the models represented by the majority of ontologies described. However, we believe it would be very difficult to align them. Without doubt, these ontologies would provide Trading.owl with many enriching concepts, in particular those economic and business concepts whose rigorous definitions could complete the meaning of Trading.owl’s concepts. On the other hand, an ontology produced following our methodology could provide all these business ontologies with the richness offered by specific knowledge directly extracted from real companies and fully treated from the formal logic point of view. References [1] S. Kambhampati, C.A. Knoblock, Information integration on the web: a view from AI and databases, ACM SIGMOD Record 32 (4) (2003) 122–123.

[2] S. Deveraj, R. Kohli, Performance impacts of information technology: is actual usage the missing link?, Management Science 49 (3) (2003) 273–289 [3] W. Fan, H. Lu, S.E. Madnick, D. Cheung, Discovering and reconciling value conflicts for numerical data integration, Information Systems 26 (8) (2001) 635–656. [4] M.D. Lytras, P. Athanasia, Towards the development of a novel taxonomy of knowledge management systems from a learning perspective: an integrated approach to learning and knowledge infrastructures, Journal of Knowledge Management 10 (6) (2006) 64–80. [5] D.G. Schwartz, Semantic information management and e-business: towards more transparent value chains, International Journal of Business Environment 2 (2) (2008) 168–187. [6] K. Su, H. Huang, X. Wu, S. Zhang, A logical framework for identifying quality knowledge from different data sources, Decision Support Systems 42 (3) (2006) 1673–1683. [7] J. de Bruijn, Semantic integration of disparate data sources in the COG project, in: Proceedings of the 6th International Conference on Enterprise Information Systems (ICEIS2004), Porto, Portugal, ICEIS Press, 2004. [8] K. Siau, Interorganizational systems and competitive advantages – lessons from history, Journal of Computer Information Systems 44 (1) (2003) 33–39. [9] W. Buchholz, Ontology, in: D.G. Schwartz (Ed.), Encyclopedia of Knowledge Management, IGI Reference, 2006, pp. 694–702. [10] T.R. Gruber, Toward principles for the design of ontologies used for knowledge sharing, International Journal of Human–Computer Studies 43 (5–6) (1995) special issue on the role of formal ontology in the information technology, pp. 907–928. [11] P. Borst, H. Akkermans, J. Top, Engineering ontologies, International Journal of Human–Computer Studies 46 (2–3) (1997) 365–406. [12] E. Bendoly, A. Soni, M.A. Venkataramanan, Value chain resource planning: adding value with systems beyond the enterprise, Business Horizons 47 (2) (2004) 79–86. [13] C. Loebbecke, A. Angehrn, Coopetition, in: D.G. Schwartz (Ed.), Encyclopedia of Knowledge Management, IGI Reference, 2006, pp. 58–66. [14] B. Kogut, The network as knowledge: generative rules and the emergence of structure, Strategic Management Journal 21 (3) (2000) 405–425. [15] S. March, A. Hevner, S. Ram, Research commentary: an agenda for information technology research in heterogeneous and distributed environments, Information Systems Research 11 (4) (2000) 327–341. [16] S. Ram, J. Park, D. Lee, Digital libraries for the next millennium: challenges and research directions, Information Systems Frontier 1 (1) (1999) 75–94. [17] D. Chen, F.B. Vernadat, Enterprise interoperability: a standardized view, in: K. Kosanke et al.(Ed.), IFIP Series, vol. 108, Kluwer Academic Publishers, 2003, pp. 273–282. [18] S. Bergamaschi, S. Castano, S. De Capitani, S. Montanari, M. Vincini, A semantic approach to information integration: the momis project (1998), AI*IA 98, Padova IT, September, 1998. Available at: /http:// sparc20.dsi.unimo.it/prototipo/paper/aiia98.ps.gzS. [19] P. Mitra, J. Jannink, G. Wiederhold, Semiautomatic integration of knowledge sources, in: Proceedings of Fusion’99, Sunnyvale, USA, 1999. [20] J. Park, S. Ram, Information systems interoperability: what lies beneath?, ACM Transactions on Information Systems 22 (4) (2004) 595–632 [21] M.P. Reddy, B.E. Prasad, P.G. Reddy, A. Gupta, A methodology for integration of heterogeneous databases, IEEE Transactions on Knowledge and Data Engineering 6 (6) (1994) 920–933. [22] A.P. Sheth, S.K. Gala, S.B. Andnavathe, On automatic reasoning for schema integration, International Journal of Intelligent and Cooperative Information Systems 2 (1) (1993) 23–50. [23] J. Kahng, D. McLeod, Dynamic classificational ontologies: mediation of information sharing in cooperative federated database systems, in: M.P. Papazoglou, G. Sohlageter (Eds.), Cooperative Information Systems: Trends and Directions, Academic Press, San Diego, CA, 1998, pp. 179–203. [24] A.M. Ouksel, A framework for a scalable agent architecture of cooperating heterogeneous knowledge sources, in: M. Klusch (Ed.), Intelligent Information Agents: Agent-Based Information Discovery and Management on the Internet, Springer, Berlin, Germany, 1999, pp. 100–124. [25] E. Sciore, M. Siegel, A. Rosenthal, Using semantic values to facilitate interoperability among heterogeneous information systems, ACM Transactions on Database Systems 19 (2) (1994) 254–290.

ARTICLE IN PRESS A. Paredes-Moreno et al. / Information Systems 35 (2010) 758–773

[26] Y. Arens, C.A. Knoblock, W.-M. Shen, Query reformulation for dynamic information integration, Journal of Intelligent Information Systems 6 (2/3) (1996) 99–130. [27] V.S. Lakshmanan, F. Sadri, I.N. Subramanian, Logic and algebraic languages for interoperability in multidatabase systems, Journal Logic Programming 33 (2) (1997) 101–149. [28] L. Obrst, 2003. Ontologies for semantically interoperable systems, in: Proceedings of the International Conference on Information and Knowledge Management, pp. 366–369. [29] X. Yang, N. He, L. Wu, J. Liu, Ontology based approach of semantic information integration, Journal of Southeast University 23 (3) (2007) 338–342. [30] Q.Z. Yang, Y. Zhang, Semantic interoperability in building design: methods and tools, Computer-Aided Design 38 (2006) 1099–1112. [31] D. Fensel, M. Breu, V. Alexiev, R. Lara, H. Lausen, Information Integration with Ontologies, Experiences form an Industrial Showcase, John Wiley & Sons Ltd, England, 2005. [32] D.G. Schwartz, Z. Schreiber, Semantic information management with a common business language, in: L.C. Rivero, J.H. Doorn, V.E. Ferraggine (Eds.), Encyclopedia of Database Technologies and Applications, Idea Group Publishing, 2005, pp. 593–600. [33] D. Beneventano, S. Bergamaschi, D. Bianco, F. Guerra, M. Vincini, SI-Web: a Web based interface for the MOMIS project. Demo Session, Convegno su Sistemi Evoluti per Basi di Dati (SEBD02), 2002. [34] J. Fowler, Agent based semantic interoperability in infosleuth, SIGMOD Record 28 (1) (1999) 60–67. [35] E. Mena, et al., OBSERVER: An approach for query processing in global information systems based on interoperation across preexisting ontologies. In CoopIS, 1996, pp. 14–25. [36] P.M.D. Gray, A. Preece, N.J. Fiddian, W.A. Gray, T.J.M. Bench-Capon, M.J.R. Shave, N. Azarmi, M. Wiegand, KRAFT: knowledge fusion from distributed databases and knowledge bases, in: R.R. Wagner (Ed.), Eighth International Workshop on Database and Expert System Applications (DEXA-97), IEEE Press, New York, 1997, pp. 682–691. [37] D.L. McGuinness, R. Fikes, J. Rice, S. Wilder, 2000. An environment for merging and testing large ontologies, in: Proceedings of KR 2000, Morgan Kaufmann, pp. 483–493. [38] D.L. McGuinness, R. Fikes, J. Rice, S. Wilder, The chimaera ontology environment, in: Proceedings of the 7th Conference on Artificial Intelligence (AAAI00) and of the 12th Conference on Innovative Applications of Artificial Intelligence (IAAI00), AAAI Press, 2000, pp. 1123–1124. [39] P. Mitra, G. Wiederhold, M. Kersten, A graph oriented model for articulation of ontology interdependencies, in: Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology, 2000, pp. 86–100. [40] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P.F. PatelSchneider (Eds.), The Description Logic Handbook: Theory, Implementation, and Applications, Cambridge University Press, 2003. [41] H. Knublauch, R. Fergerson, N.F. Noy, M.A. Musen, 2004. The prote´ge´ OWL plugin: An open development environment for semantic web applications, in: Sheila A. McIlraith, Dimitris Plexousakis, and Frank van Harmelen, (Eds.), International Semantic Web Conference (ISWC): Third International Semantic Web Conference, Hiroshima, Japan. Proceedings, Volume 3298 of Lecture Notes in Computer Science, Springer, pp. 229–243. [42] Z. Schreiber, Semantic information management: Solving the enterprise data problem, /www.unicorn.comS, /http://www. ibm.com/us/S, 2003. [43] K. Peffers, T. Tuunanen, M.A. Rothenberger, S. Chatterjee, A design science research methodology for information systems research, Journal of Management Information Systems 24 (3) (2007) 45–77. [44] J.F. Nunamaker, M. Chen, T.D.M. Purdin, Systems development in information systems research, Journal of Management Information Systems 7 (3) (1991) 89–106. [45] S. March, G. Smith, Design and natural science research on information technology, Decision Support Systems 15 (4) (1995) 251–266.

773

[46] J. Walls, G. Widmeyer, O. El Sawy, Building an information system design theory for vigilant EIS, Information Systems Research 3 (1) (1992) 36–59. [47] A. Kalyanpur, Debugging and repair of OWL ontologies, Ph.D. Dissertation, University of Maryland, 2006. [48] Jenz, Partner GmbH, Business management ontology (bmo), Version 1.0 (release notes), Technical report, 2004. [49] W.E. McCarthy, The REA accounting model: a generalized framework for accounting systems in a shared data environment, The Accounting Review LVII (3) (1982) 554–578. [50] W.E. McCarthy, The REA modelling approach to teaching accounting information systems, Issues in Accounting Education 18 (4) (2003) 427–441. [51] W.E. McCarthy, Semantic modeling in accounting education, practice, and research: some progress and impediments, in: B. Thalheim, J. Akoka, H. Kangassalo (Eds.), Conceptual Modeling: Current Issues and Future Directions, Lecture Notes in Computer Science, Springer, 1999, pp. 144–153. [52] D.E. O’Leary, On the relationship between REA and SAP, International Journal of Accounting Information Systems 5 (2004) 65–81. [53] ECIMF Project Group, Ecommerce integration meta-framework, Final draft, Technical report, 2003. [54] G.L. Geerts, W. McCarthy, Modeling business enterprises as valueadded process hierarchies with resource-event-agent object templates, in: Sutherland Jeff, Patel Dilip (Eds.), Business Object Design and Implementation, Springer-Verlag, 1997, pp. 113–128. [55] G.L. Geerts, W. McCarthy, An ontological analysis of the primitives of the extended-REA enterprise information architecture, The International Journal of Accounting Information Systems 3 (2002) (2002) 1–16. [56] M. Uschold, M. King, S. Moralee, Z. Yannis, The enterprise ontology, The Knowledge Engineering Review 13 (1) (1998) 31–89. ¨ [57] F. Fadel, M.S. Fox, M. Gruninger, A generic enterprise resource ontology, in: Proceedings of the Third Workshop on Enabling Technologies – Infrastructures for Collaborative Enterprises, West Virginia University, 1994. [58] T.O.V.E. TOVE Ontologies, /http://www.eil.utoronto.ca/tove/to veont.htmlS, Technology research in heterogeneous and distributed environment, Information Systems Research, vol. 11(4), 1997, pp. 327–341. [59] M. Gruninger, J.A. Pinto, 1995. A theory of complex actions for enterprise modelling, Working Notes AAAI Spring Symposium Series 1995: Extending Theories of Action: Formal Theory and Practical Applications, Stanford. [60] M.S. Fox, M. Burbuceanu, M. Gruninger, An organization ontology for enterprise modelling: preliminary concepts for linking structure and behaviour, Computers in Industry 29 (1996) 123–134. [61] H. Kim, M.S. Fox, M. Gruninger, An ontology of quality for enterprise modelling, in: Proceedings of the Fourth Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, IEEE Computer Society Press, 1995, pp. 105–116. [62] J. Lin, M.S. Fox, T. Bilgic, A requirement ontology for engineering design, in: Proceedings of the Third International Conference on Concurrent Engineering, 1996, pp. 343–351. [63] D. Tham, M.S. Fox, M. Gruninger, A cost ontology for enterprise modelling, in: Proceedings of the Third Workshop on Enabling Technologies-Infrastructures for Collaborative Enterprises, West Virginia University, 1994. [64] J. Gordijn, H. Akkermans, E3-value: design and evaluation of e-business models, IEEE Intelligent Systems 16 (4) (2001) 11–17. [65] A. Pateli, G.M. Giaglis, A framework for understanding and analysing eBusiness models, BLED 2003 Proceedings, Paper 4, /http://ais.bepress.com/bled2003/4S, 2003. [66] R. Jasper, M. Uschold, 1999. A Framework for Understanding and Classifying Ontology Applications, in: Proceedings of the 12th Workshop on Knowledge Acquisition Modeling and Management KAW’99. [67] A. Paredes-Moreno, Te´cnicas de Depuracio´n e Integracio´n de Ontologı´as en el A´mbito Empresarial, Doctoral Thesis, Dep. Computer Science and Artificial Intelligence, University of Seville. Electronic document available at: /http://personal.us.es/aparedes/ TesisAPM.pdfS, 2008.