ARTICLE IN PRESS Information Systems 35 (2010) 404–416
Contents lists available at ScienceDirect
Information Systems journal homepage: www.elsevier.com/locate/infosys
An ontology modelling perspective on business reporting Marcus Spies ¨t Mu ¨t Innsbruck, Austria ¨ nchen, Digital Enterprise Research Institute, Universita Knowledge Management, Ludwig-Maximilians-Universita
a r t i c l e i n f o
Keywords: Enterprise information integration and interoperability Languages for conceptual modelling Ontological approaches to content and knowledge management Ontology-based software engineering for enterprise solutions Domain engineering
abstract In this paper, we discuss the motivation and the fundamentals of an ontology representation of business reporting data and metadata structures as defined in the eXtensible business reporting language (XBRL) standard. The core motivation for an ontology representation is the enhanced potential for integrated analytic applications that build on quantitative reporting data combined with structured and unstructured data from additional sources. Applications of this kind will enable significant enhancements in regulatory compliance management, as they enable business analytics combined with inference engines for statistical, but also for logical inferences. In order to define a suitable ontology representation of business reporting language structures, an analysis of the logical principles of the reporting metadata taxonomies and further classification systems is presented. Based on this analysis, a representation of the generally accepted accounting principles taxonomies in XBRL by an ontology provided in the web ontology language (OWL) is proposed. An additional advantage of this representation is its compliance with the recent ontology definition metamodel (ODM) standard issued by OMG. & 2009 Elsevier B.V. All rights reserved.
1. Introduction In the present decade, several severe crises in international financial markets have been caused by failures of financial institutions to adequately balance credit risks with capital required to cover probable and, at least to some degree, unexpected losses. As a consequence, comprehensive legislations and regulations have been devised and implemented in order to allow banks and other financial institutions to better assess credit risks and to allow supervisory bodies to enforce the building of adequate capital reserves. One of the best understood and implemented regulations is the Basel II accord [1]. A second major area of regulation in the financial sector is related to business reporting. Driven by the need Corresponding author at: Institute of Informatics, LMU University of Munich, Leopoldstr. 13, D-80802 Munich, Germany. Tel.: +49 89 2180 5164. E-mail address:
[email protected]
0306-4379/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.is.2008.12.003
to integrate data across countries and business sectors for adequate economic decisions on national and supranational political levels in the growing free trade areas, e.g., in Europe or in the Americas, business reporting data structures and practices have to follow a set of common representation schemes and rules. The two major efforts of regulation in this respect are the international financial reporting standards (IFRS) and the eXtensible business reporting language (XBRL) [3]. 1.1. Problem statement The key management discipline to have emerged in reply to the increasing regulatory compliance requirements is business performance management. Well-structured and standards compliant reporting data, by themselves, represent only a one-sided approach to better business performance management for any practices, since improvement is primarily achieved in terms of adequate assessment and application of due
ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416
diligence in balancing any risks affecting the business. The full potential of risk-aware business performance management can be realized only through the integration of reporting with business analytics as it is common in existing data warehousing and data mining solutions. These solutions are part of or woven into enterprise IT infrastructures of nearly all major enterprises since many years, recently, they are being disseminated to SME specific solutions as a result of SME specific IT services strategies of most major vendors of enterprise IT applications. Now, if it comes to integrating the comparatively new business reporting standardized data with existing analytic applications from the data warehousing and data mining fields, architects for business performance management solutions face barriers related to application integration and information integration (see [4]). The main information integration barrier is that the new business reporting languages (XBRL and its extensions) are based on data taxonomies represented in a format (described in more detail below) that is not readily transferrable to or integrable with standard data warehousing schemata, e.g., the well-known star schema with its combination of a fact table with several dimension tables. The key application integration barrier is that reporting data processors are based on XML-technologies like XSLT 2.0 [23] and XQuery, while data warehouse processing (and data mining analytics building on it) is based on advanced SQL query processing, often integrated with programming language wrapping (e.g., Oracle PL/SQL). Recently, in order to overcome these barriers, the XBRL community has proposed function [5] and dimension [6] specifications that are designed to cover at least part of the data warehousing analytical solutions space by extending the scope of XBRL processing. Other developments in this area related to our problem statement are the data cube definition and manipulation language MDX (multi-dimensional expressions, see [15]) by Microsoft or the OLAP data manipulation language (DML) by Oracle [17]. These languages allow elegant formulations of data cube schema definitions and complex manipulations abstracting from the underlying database table representation structure (relational, multidimensional). In the open source community, a related language available is Mondrian that is part of the Pentaho open source business intelligence suite [16]. Mondrian allows users to define warehouse schemata interactively and generates the SQL statements needed. MDX is being integrated into the web service API XML-A (XML for analysis). Currently, neither MDX nor XML-A has an explicit relationship to XBRL. Consequently, it can be stated that while we certainly need information and application integration in the fields of standards compliant business reporting and business analytics, the current information structure standards are not matching well and the solution space on the vendor as well as on the open source side is fragmented. This is the problem which the present paper attempts to start solving. We hope that, based on the work initiated in this paper, a sufficiently general metamodelling perspective on business reporting languages will be established that will
405
allow to integrate XBRL dimensions and functions easily into data warehouse definition and manipulation languages. The present paper presents a first building block towards an integrative approach to business performance management solutions that integrate standards compliant reporting and business analytics. The solution approach is to define a common metamodel in terms of a sufficiently rich modelling language that allows for mapping transformations between the schemata in question. This metamodel is stated here in general terms and is used to define unified modelling language (UML) class diagrams and web ontology language (OWL) ontologies as general representation formats for XBRL structures and instance data. With this step, applications integrating XBRL data with conventional warehouse data can build on usual entity relationship models derived from UML for the reporting data, see [9,11]. Moreover, with semantic technologies, an additional set of innovative services is aimed at, which will be briefly described. 1.2. Context—semantics enabled business intelligence services An important motivation for investigating a transformation between XBRL reporting language structures and OWL ontologies is given by the context of the present work within the EU integrated project MUSING (MUlti-industry, semantic-based next generation business INtelliGence, see http://www.musing.eu). Specifically, MUSING proposes to use ontologies as the key integrative knowledge and data schemata representation formalism for a multitude of services, notably those based on applications that integrate text mining with traditional business intelligence applications. A striking example of the advantage of this integration is the exploitation of textual reporting items in XBRL reporting structures for extended analytical applications. These textual items often convey key information to the business analyst that remains unused in purely quantitative analytic applications. Components in MUSING using ontologies for financial reporting are
balance
sheet data extraction from pdf documents—pdf2xbrl tool of DFKI Saarbru¨cken, see [12]; text mining for balance sheet relevant information—the GATE tool (Sheffield university) provides annotations of texts from news services (ontology enabled text classifications) and extractions of specific facts from annotated texts [13]; probabilistic analysis of reporting data—statistical analysis of risks using event, cause and area taxonomies (for a general introduction, see [14], and for an example in the field of loss analysis, see [18]).
Both pdf2xbrl and GATE end up in creating new instances in existing ontologies using textual input. This functionality is often referred to as ontology population. Thus, these tools serve as semantic ETL (extraction,
ARTICLE IN PRESS 406
M. Spies / Information Systems 35 (2010) 404–416
transformation and loading) layer in a semantics enabled business analytics architecture. The ontologies can generally be compared to data warehouse schemata; however, much more information is contained here that will be described in the relevant MUSING publications in detail. The basic purpose of MUSING is to integrate these and further components using an enterprise service bus architecture for flexible and dynamically composable services in various business scenarios related to risk management. To give an impression of the potential of the approach, we mention services in the three key industry areas of MUSING together with examples of relevant textual data. Since MUSING is a large scale integrated project, each industry area services domain has a driving industry component integrator, which is mentioned in parentheses:
Financial
risk management—services supporting advanced rating and assessment procedures for banks. Textual data come from balance sheet explanatory notes and from analyst texts (Banca di Monte dei Paschi di Siena). IT operational risk management—services supporting assessment and mitigation of risks related to IT managed operation. Textual data come from written failure descriptions by customer organizations (KPA). Internationalization consulting—services supporting internationalization of operations. Textual data come from target region government and analyst reports (Verband der Vereine Creditreform).
reporting taxonomy reporting items context (reporting entity, reporting period)
legal reference calculation rules
currency (unit, ...) facts data
label—human readable descriptions of reporting ele
ments in a given natural language; calculation—aggregation rules for reporting items; presentation—hierarchies and priorities for representing reporting elements in graphical interfaces; reference—text from legal documents or regulations applicable to a given reporting item.
In older versions, an additional definition linkbase was used to represent the reporting element hierarchies. This has been merged into the presentation linkbase hierarchy recently. Fig. 1 gives an overview of this overall structure.
label
Fig. 1. Illustration of the basic structure of XBRL that builds on a distinction of the fact layer from the conceptual layer. Quoted from the ABRA toolset documentation [19].
A key element of the conceptual component of XBRL is reporting item taxonomies. These taxonomies are related to conceptual hierarchies and often correspond to aggregation hierarchies on the calculation level. For example, a company may deliver a reporting document myReportDE.xml complying with the elements required in a German specific reporting XML schema document. A small snippet of the current XML schema defining the reporting elements for the generally accepted accounting principles (GAAP) in Germany looks as follows: oelement name ¼
oelement name ¼
2. XBRL—a brief tour The fundamental idea of XBRL [3] is to allow for a conceptual and physical separation of reporting facts from reporting metadata. The metadata are used to convey the conceptual meaning of reporting data items in a standardized way. Facts are represented in XBRL instance documents using XML-elements, and metadata are available via linkbases (these are documents specifying typed links between referenced elements in XML documents based on the XLink specification, see [29]). In a full XBRL reporting metadata representation, one XML schema file is needed to define the instance data elements, and at least four linkbases are needed to define various aspects on the conceptual level:
concepts meta−data
oelement name ¼
oelement name ¼
‘‘bs" type ¼ ‘‘xbrli:stringItemType" abstract ¼ ‘‘true" substitutionGroup ¼ ‘‘xbrli:item" nillable ¼ ‘‘true" id ¼ ‘‘de-gaapci_bs" xbrli:periodType ¼ ‘‘instant"/4; ‘‘bs.ass" type ¼ ‘‘xbrli:monetaryItemType" substitutionGroup ¼ ‘‘de-gaapci:hbst.changeposition.item" nillable ¼ ‘‘true" id ¼ ‘‘de-gaapci_bs.ass" xbrli:balance ¼ ‘‘debit" xbrli:periodType ¼ ‘‘instant"/4; ‘‘bs.ass.unpaidCap" type ¼ ‘‘xbrli:monetaryItemType" substitutionGroup ¼ ‘‘de-gaapci:hbst.changeposition.item" nillable ¼ ‘‘true" id ¼ ‘‘de-gaapci_bs.ass.unpaidCap" xbrli:balance ¼ ‘‘debit" xbrli:periodType ¼ ‘‘instant"/4; ‘‘bs.ass.unpaidCap.called" type ¼ ‘‘xbrli:monetaryItemType" substitutionGroup ¼ ‘‘de-gaapci:hbst.changeposition.item" nillable ¼ ‘‘true".
A reporting document compliant with this schema like myReportDE.xml consists essentially of a flat list of XML elements corresponding to the items defined here. The use of such a document for analysis in terms of the metadata layer is enabled by the XBRL framework for reporting item hierarchies. This framework is based on the XLink specification and uses linkbases to express a variety of semantic attributes and relationships of/between reporting items. Specifically, presentation (formerly additional definition) linkbases contain item–subitem relationships in an explicit format for each item/subitem pair. In current XBRL taxonomies, these relationships are stated
ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416
as parent–child links. For example, in the income statement hierarchy for German GAAP, we have opresentationArc xlink:type ¼ ‘‘arc" xlink:arcrole ¼ ‘‘http://www.xbrl.org/2003/arcrole/ parent–child" xlink:from ¼ ‘‘is.netIncome" xlink:to ¼ ‘‘is.netIncome.regular" use ¼ ‘‘optional" order ¼ ‘‘1.0"/4.
An advantage of this representation is that it allows to declare a hierarchy in an instance data transparent way. Therefore, a key feature of XBRL is that an instance document may be analyzed according to different taxonomies according to the goal of an analytic application. For example, the German GAAP XBRL taxonomy allows myReportDE.xml to be used in financial analytic applications related to taxes calculations. Now, for the purposes of data integration and integrated analytics, say, on a European level, the same instance document myReportDE.xml can be described with a different taxonomy that might group reporting items in a different way, leave out some of them, etc. This is possible by processing myReportDE.xml with an XBRL processor that uses the taxonomy linkbase(s) relevant to the EU wide integrated analytics. An important application of this principle is the definition of financial ratios from balance sheet items or weighted sums of them. Financial ratios are widely used by rating agencies to assess or predict company performance. An example of calculations of Basel II relevant financial ratios from XBRL reporting items is given in Fig. 2. In principle, no changes to the structure of the instance document myReportDE.xml are needed for this and related analytic usages. In order to support localization, national language specific labels may be given to balance sheet items without changing their basic names as XML elements. The mapping of XBRL instance document elements needed for analytics on a supra-national level is enabled by an additional specification in the XBRL family, namely the XBRL general ledger [2]. Often, as in the examples given here, the naming scheme of XBRL items is related to the taxonomy on a national level. Items in European XBRL dialects are named with taxonomy levels appended to each other in a dotted hierarchical notation. As an example, a balance sheet (a string item with name bs) has a section corresponding to assets as opposed to equities and liabilities, the item name for the assets total is then bs.ass. The assets are further decomposed, among others, into fixed and current assets, leading to the item name bs.ass.fixedAss for the fixed assets total, and so on. In addition, apart from integration applications, specific analytic needs arise in the context of compliance management. For example, the balance sheet instance myReportDE.xml might need to be re-analyzed for specific risk analysis indicators with a suitable linkbase that states which items compose which risk-analytic quantity. Additionally, specific computations might be needed for risk analytic quantities that take into account further reporting data outside the balance sheet, e.g.,
407
items that are part of the profit and loss (PNL) statement. The definition of such computations is expressible with XBRL function documents. To sum up, by applying different taxonomy linkbases to a given XBRL instance document, many analytic applications with different scopes and purposes are possible without modifying the instance document. Henceforth, we refer to this property of XBRL compliant business reports as the instance transparency of taxonomies (ITT) property. This property has legal implications, as well, as an electronically submitted reporting document should be digitally signed by the issuing organization and should therefore not allow for uncontrolled modifications in the course of analytical evaluations. 3. Principles and design rationale of a metamodel capturing XBRL metadata In order to define the methodological principles of an ontology representation of the XBRL reporting structures appropriately, we first need to examine the modelling language of these structures using standard modelling approaches in the context of the UML. In particular, as specified in [9,10], it is useful to distinguish at least four levels in modelling a particular domain:
The instance level ðM0Þ comprises the actual objects of
the domain in question, e.g., an XBRL reporting document submitted by a given company in a given reporting period. The instance model level ðM1Þ comprises the XBRL analytic taxonomies and calculation rules relevant for a reporting instance document, e.g., a balance sheet item hierarchy for commercial and industrial enterprises in the scope of the fiscal legislation in the United States. The modelling language level ðM2Þ comprises the principles of defining analytic taxonomies, e.g., the set of arc types and link roles defined in the XBRL specification for representing a reporting taxonomy. This level is often referred to as the metamodel level. It typically contains model elements that are related to a general information representation approach, like entity relationship structures in database design. The modelling language definition level ðM3Þ comprises the basic object modelling language or modelling elements in terms of which the M2 level language (the metamodel) can be built. This level is generally referred to as the metametamodel level. For UML, this level contains the so-called metaobject facility, which is a set of entirely generic object oriented modelling features and services.
In practice, more than these four levels occur. This is the case in particular, if the so-called domain specific modelling languages (DSML) are needed that are more specific than, say, a general database design model. A good example of a DSML is given by logical design and query modelling languages for online analytic processing (OLAP) in the realm of data warehousing, e.g., Microsoft’s MDX. XBRL can be seen as a DSML, as well, since it specifies a
408
ARTICLE IN PRESS
M. Spies / Information Systems 35 (2010) 404–416
Fig. 2. Example of the usage of balance sheet items for the calculation of capital relevant to Basel II risk management.
ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416
dedicated set of XML schema and XLink arc types and link roles for the purpose of modelling the domain of financial reporting information structures. From the point of view of the UML modelling, XBRL clearly specifies documents on the M0 and M1 levels. Instance documents are M0 level entities, whereas taxonomies and dimension definitions together constitute a set of M1 level models. The XBRL specification is itself a metamodel (on the M2 level, in generic terms); however, it is entirely domain specific. This means that we do not have (at least to the best of the author’s knowledge) a generic modelling approach into which XBRL fits. From the practical standpoint, this is not interesting if we confine ourselves to using this reporting language for interaction between businesses and governmental or regulatory institutions. The general embedding of XBRL becomes interesting, however, in the context of the problem statement of the present paper. In practice, it is necessary to integrate XBRL compliant reporting with other taxonomy-based approaches, e.g., OLAP as delivered in standard business intelligence suites from many vendors. A general embedding of XBRL in object modelling is also a key for defining the modelling principles of an ontology representation of XBRL reporting structures that is designed to integrate financial reporting with information extraction from analysis reports, etc., as in the MUSING project (see above). Therefore, we now present an approach to understanding XBRL in terms of a metamodel that is part of a general modelling approach conforming with UML. Metamodels in the realm of data analysis and data warehousing can be seen as models of metadata, see [21]. Since XBRL concept taxonomies, calculation linkbases and related documents constitute an M1 model of the reporting data in XBRL instance documents, a UML conformant metamodel of these documents must capture the general classes and relationships characterizing these metadata. From the XML schema perspective, a metamodel for XBRL can be constructed using the XBRL schema documents together with the XLink schema (can be generated from the published XLink dtd). This kind of metamodel would constitute a display surface metamodel according to the definition in [8]. From the more general perspective of metadata exchange between different applications living on different platforms (like data warehousing platforms based on Mondrian, SQL DML, DMX, etc., see above) a metamodel formulation is desirable that captures the basic structure of the business reporting domain. From the modelling standards point of view, a natural general choice for a metamodel in this situation is the common warehouse metamodel (CWM) as issued by OMG [21]. CWM is a metamodel for business analytics metadata exchange. In its present version, issued before the latest UML specification, CWM also contains a general object metamodel. For our purposes, in particular, CWM comprises an OLAP package. Therefore, viewing key XBRL structures on the M2 level in their relationship to CWM will enable to define transformations from XBRL taxonomies to specific analytic workspace structures in a specific data warehouse platform. While a full representation of XBRL in
409
terms of CWM is outside the scope of the present paper, we will dwell on monetary reporting items here and outline a metamodel of the XBRL document structures in terms of CWM. The main idea behind the representation proposed here is that the metadata in an XBRL model for monetary reporting items are tantamount to defining a data cube in the data warehousing sense (see, e.g., [7]). The monetary items in a reporting document are reconstructed as a fact table with monetary variable values (characterized by metadata defining context and currency). A balance sheet is composed of two dimensions according to this metamodel, namely those corresponding to the assets and the equities/liabilities subsections of the reporting document. All discoverable taxonomies in an XBRL taxonomy linkbase applicable to the reporting document are captured as additional dimensions of the analytic data cube. More additional dimensions can be constructed from the so-called context data of an XBRL instance document referring to reporting period, reporting company, its industry sector, and so on. They can be addressed on the specific modelling level M1 using the aforementioned XBRL dimension specification. The hierarchy in a warehouse dimension for monetary reporting items is represented in XBRL by the parent–child relationships stated in a taxonomy linkbase document. A fact table may contain additional numeric and/or textual attributes. It is important to adequately capture a conceptual hierarchy or taxonomy component of XBRL in a data warehouse metamodel view. This conceptual hierarchy is not just a business taxonomy. Business taxonomies and vocabularies are captured in CWM in the business nomenclature package that lives separately from the OLAP package. XBRL taxonomies are to be characterized as a compositional hierarchies rather than pure conceptual hierarchies, since, as we saw, values of many monetary reporting items on higher hierarchy layers are obtained from aggregation functions applied to subordinate item values. The following definitions from [21] apply precisely to XBRL monetary item hierarchies: A Dimension has zero or more Hierarchies. A Hierarchy is an organizational structure that describes a traversal pattern through a Dimension, based on parent/child relationships between members of a Dimension. Hierarchies are used to define both navigational and consolidation/computational paths through the Dimension (i.e., a value associated with a child member is aggregated by one or more parents). In particular, CWM distinguishes level hierarchies from value hierarchies. While the first possess a well-defined number of layers, the second can be expressed only in a DAG (directed acyclic graph) linking parent and child elements. Common examples of level hierarchies include region maps with each level corresponding to a regional entity containing entities of the next lower level (e.g., a city contained in a region contained a country). Typically, there are several members at each level of a level-based hierarchy. Again, citing from [21], we have: A ValueBasedHierarchy defines a hierarchical ordering of members in which ythe topological structure of the
ARTICLE IN PRESS 410
M. Spies / Information Systems 35 (2010) 404–416
Fig. 3. Basic relationships between OMG common warehouse metamodel (CWM) elements and eXtensible business reporting language (XBRL) modelling structures. XBRL structures are defined in terms of corresponding CWM metamodel stereotypes where applicable. Note that not all XBRL structures mentioned here are explicitly defined in a set of XBRL domain specific metadata documents. For convenience, XBRL file structures defined to contain the related information for a specific XBRL reporting model are also contained in the diagram (for further explanations, see text).
hierarchy conveys meaning. yValueBasedHierarchy can be used to model pure linked node hierarchies (e.g., asymmetric hierarchical graphs or parent-child tables). LevelBasedHierarchy contains an ordered collection of HierarchyLevelAssocations that defines the natural hierarchy of the Dimension. The ordering defines the hierarchical structure in top-down fashion y. While the specification of an XBRL taxonomy using XLink in an XML linkbase document evidently uses a topological representation in terms of a directed acyclic parent–child graph, the general representation of such a taxonomy is not limited to this topology. In fact, the naming of reporting item elements using concatenation of dotted level names (as demonstrated) corresponds to a level-based rather than value-based hierarchy. Similarly, for many implementation purposes, a general representation in terms of a level-based hierarchy is more apt to the situation since it puts the aggregation or consolidation steps used by statistical analysis of XBRL documents in evidence. Fig. 3 summarizes the key metamodel elements that are useful for a data warehouse conforming the view of key XBRL metadata structures. For example, an XBRL reporting taxonomy (physically, usually a presentation linkbase file) is stated to correspond to a dimension deployment within a cube dimension (not shown). The single arcs in such a linkbase correspond to a StructureMap in CWM that defines the hierarchical structure of the dimension by parent/child relationships. The elements between which such relationships are stated are given in an XML schema file listing the reporting elements by name, type and some further metadata.
The structured name metaclass generalizes the two naming schemes used for XBRL reporting items based on short names like bs for balance sheet. One naming scheme uses dotted notation, the other one camel case capitalization. Technically, Fig. 3 is based on UML 2.1 profile representation of the CWM metamodel (a UML 1.4 model in the current specification). The stereotypes from this profile are here used for the metamodelling or M2 level which amounts to specifying XBRL as a domain specific modelling language based on a CWM profile. The UML 2.1 representation of CWM was generated using Eclipse EMF, and the transformation to a UML profile was realized with MagicDraw 15.5.
4. Ontology representation of XBRL accounting structures Understanding XBRL reporting data models in terms of data warehousing metamodel structures paves the way for a further important development. While the CWM conforming DSML from the previous section explicitly refers to specific XBRL reporting metadata structures, a generalization is possible and reasonable. Many dimensions relevant to analyzing XBRL data beyond the single enterprise level are not captured by reporting item taxonomies. A simple example for an additional dimension is the type of economic activities of a company; another example is a regional consolidation of balance sheet figures. In the XBRL community, a dedicated specification of such additional analytic dimensions has
ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416
been given in the XBRL dimensions specification [6] which defines additional document structures. A natural alternative to such a specification can be given by in terms of a general approach conforming to the CWM metamodel. In fact, since the XBRL reporting hierarchies can be seen as platform specific definitions of general CWM data warehousing concepts related to data cubes and dimensions, a common representation of reporting metadata with other dimensional metadata for company analysis is reasonable. Such a common representation can suitably be given by an ontology for reporting items that abstracts from the XML document structure defined in the XBRL specifications while maintaining the full support of taxonomies and their dimensional hierarchies. In this section, we define the essential guiding principles for an ontology representation in OWL 1.1 [20] of business reporting taxonomies and their instance data. As for the earlier work on XBRL ontologies, the reader is referred to [12]. While the approach in [12] has a few features in common with the one presented here, we focus on a data warehousing model [21] compliant basic structure of the ontology. From the modelling perspective, an ontology serves both on the M2 and on the M1 levels. Since ontologies are bound to the restrictions imposed by first order logic (FOL), repeated steps of instantiation as they are used in the metamodel hierarchy of the model driven architecture are not possible. As a remedy, it has become popular to separately build basic or upper ontologies for the M2 level and derive from their modelling elements the specific classes corresponding to the M1 level. These different ontologies can be separated symbolically by using different namespaces and physically by using different files or database repositories. The basic purpose of an ontology representation of XBRL metadata is to provide an integrated representation of all XBRL metadata, in particular reporting item taxonomies. The benefit is a common view on reporting metadata together with reporting entity (e.g., company) metadata for the purposes of general analysis and decision-making on all levels, from the corporate level up to supranational economic indicator assessment. While this benefit can also be obtained by a UML model, an ontology representation offers additional benefits due to the reasoning capabilities of inference engines operating on ontologies. These reasoning capabilities help to go beyond the logical data integration view to the knowledge integration view that is characterized by processing heterogeneous datasets together with representations of business rules or regulations. The guiding principles of the ontology representation of XBRL taxonomies are
model transformation of taxonomies—there must be a
metamodel-based translation from the XBRL taxonomy definition to suitable ontology constructs. Instance to instances transformation—data from an XBRL instance document must be represented as ontology instance or instances. If the representation uses multiple instances, the common context (report-
411
ing period, reporting entity identifiers) must be retrievable via suitable properties. Aggregation rules separation—reporting elements can be aggregated or used in formulae to provide indicators, and these rules are expressed in a separate rules layer and executed using an inference engine. Unstructured (e.g., legal) references management— textual reporting data are represented as string properties in the ontology; textual metadata like legal references can be stored in annotations to an ontology class.
Note that, from a simple automation point of view, transformations from XML schema documents to ontologies could be generated with very low effort. The TopQuadrant tool TopBraid offers a nice component to do this. However, as we have seen, XBRL is developed on the basis of a profound separation of a data layer from a conceptual layer that leads to a diversity of XML schema documents. In this paper, we base our ontologies on the OWL [20]. The main task for an ontology representation of XBRL accounting data and metadata in the OWL is to capture the compositional value hierarchy as expressed in the CWM-profile-based metamodel of the preceding section. In developing the approach we also ensure that the results will be reusable in a general context by enforcing compliance of our ontologies with the recent ontology definition metamodel (ODM, see [11]) issued by OMG. As a prerequisite, an ontology representation of the basic XBRL structures like a reporting item element and its context (which company is reporting, what time frame or which point in time does the item refer to) has to be given. As shown in [12], already on this level we have many choices. For the present work, we made a few simple adaptions documented in [22] that allow for a rather straightforward representation of these basic structures compliant with the description logic level of OWL.
4.1. Methodology for ontology engineering of reporting taxonomies The XBRL reporting taxonomies are described in the specification as conceptual taxonomies. From a general business taxonomy point of view, this description is sufficiently precise. The core property of a conceptual taxonomy is a hierarchy of super- and subconcepts. As multiple hierarchies involving the same conceptual entities may be used for different analytical purposes, reporting taxonomies actually define poly-hierarchies (also called heterarchies) relating the conceptual entities. The mathematical structure of poly-hierarchical conceptual systems has been studied in [24]. Many concept-based search engines (like wissen.de in Germany) make use of the elegant lattice properties of poly-hierarchies. A web-based user interface for constructing poly-hierarchies from entity-property cross-tables has been developed and proposed as knowledge management tool in [25].
ARTICLE IN PRESS 412
M. Spies / Information Systems 35 (2010) 404–416
For clarity, in the sequel, we will use the term taxonomy in a restricted sense representing tree-shaped hierarchies. Poly-hierarchies are assumed to be expressed by multiple taxonomies. Technically, an XBRL linkbase may contain a poly-hierarchy, but this is not a common practice, as taxonomy linkbases are used in conjunction with presentation linkbases for generating tree views in user interfaces to XBRL processors. If a conceptual polyhierarchy is needed, this can always be accomplished by applying multiple taxonomies to a reporting element item schema. In the sequel, we use the concept aggregation function. An aggregation function takes one or more arguments as input and combines them into a single valued output. A simple example of a numeric aggregation function is summation—this is actually the prevailing aggregation function in business reporting. Note that this concept of an aggregation function corresponds to the usual meaning of aggregation in the context of DML and should not be confused with the concept of aggregation in UML structural modelling [9]. From an ontology definition point of view, conceptual taxonomies usually can be represented as class/subclass relationships. However, for the XBRL reporting taxonomies, this usual approach is inappropriate for several reasons: 1. Values of reporting items on lower levels of the taxonomy are successively aggregated (often by summation) to generate the values of reporting items on higher taxonomy levels. We will refer to this property henceforth as functional value aggregation across the hierarchy (VAH). As a consequence of VAH, the relationship between reporting items on successive taxonomy levels does not correspond exactly to a class hierarchy. In terms of UML class diagrams, a conceptual entity on a higher (closer to the root elements) level of the taxonomy is rather a composition of lower level entities than their superclass. 2. the ITT property (instance transparency of taxonomies, as explained earlier) requires that taxonomy hierarchies can have a different structure for identical reporting items. While inversion of hierarchical relationships between taxonomies is highly unlikely, an ontology definition of a reporting taxonomy should not a priori exclude possible reorganizations of hierarchies in different taxonomies. Consequently, an ontology representation of XBRL reporting concept taxonomies should not use a simple class hierarchy. Instead, it is proposed to distinguish between two basic levels of reporting items:
derived by an aggregation function defined on the collection of next lower level reporting items. Thus, the representation of reporting taxonomies in an ontology needs a formalism for compositional hierarchies. This need arises very often in ontology engineering, e.g., in anatomy and in spatial object modelling. These and other application domains have motivated the formalization of mereologies, or simply part-of-relationship ontologies. A recent formulation of a mereology can be found in the LKIF ontology of the EU Estrella project [26]. A key element of any mereology is the part relation (or object property, in OWL parlance). The part (read as hasPart) relation is transitive (parts of parts of an object are parts of the object) and asymmetric (if a has part b, then b does not have part a). Objects that have no parts are atoms in the mereology, and objects that have at least one part are wholes. In many applications, we require a non-transitive subrelation of part. Basically, non-transitivity means that parts compose a whole only on some given level of an aggregation or composition (e.g., a roof tile is part of a roof, but not by itself part of a house). For these nontransitive relations, LKIF has the subrelation strictPart (again, read this as hasStrictPart). As an aside, note that all mereology relations in an OWL ontology are OWL object properties which are extensionally interpreted as sets of ordered pairs. Subrelations are interpreted as subsets of such sets. Therefore, it is legitimate to have a non-transitive subproperty of a transitive property. The converse relations of the part and strictPart relations are part_of and strictPart_of, respectively. Moreover, the LKIF mereology defines member and composition relations as subrelations of strictPart. The intended meaning here of the composition relation is that each step in an iterated composition relationship usually involves a combination operation (like arithmetic aggregation in the case of monetary reporting items). The additional composition operation is the reason why UML specifies the composition metaassociation distinguishing it from mere aggregational part-of-relationships as we find them in spatial objects. Analogously, the member relation corresponds to a mereological relationship in the set-theoretic sense since set membership is not transitive, see [27]. In order to allow for a special treatment of spatial relationships, the LKIF ontology specifies spatial containment and spatial composition relationships. A mereology can be used as upper ontology to an ontology representation for an XBRL reporting taxonomy in an apparently straightforward way:
atomic reporting items—to be represented as mereology atomic reporting items—these are reporting items
corresponding to leaf elements of a taxonomy hierarchy tree; composed or relational reporting items—these are reporting items not corresponding to leaf elements of a taxonomy hierarchy tree. In many cases, the monetary values in composed reporting items are
atoms;
composed or relational reporting items—to be represented as mereology wholes. Fig. 4 shows the ontological foundation for representing an atomic monetary item using the LKIF mereology framework.
ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416
413
Fig. 4. Mereology foundation of ontology representation of an XBRL monetary item (for further explanation, see text).
The remaining question to answer is how to define appropriate specializations of the part or strictPart relationships for such an ontology. In order to explain a reasonable choice, we need to digress to the recent OMG ontology definition metamodel (ODM) specification. 4.2. Relational monetary items and association classes in UML If subclassing is inappropriate for representing the compositional value hierarchy relationships in XBRL, a suitable approach can be derived from the UML [9] notion of an association class that combines modelling features of a class with those of an association. If we use in particular an association related to a parent–child relationship, a suitable UML formulation for relational monetary XBRL items can be derived. An association class is defined by
a class axiom (or, in logical terms, a comprehension
association from the participants sides, a customer may place multiple orders and a products or services list can appear in multiple orders. In particular, an association class may have a member slot (or property) that corresponds to the result of an aggregation function applied to the values of suitable properties of the classes participating in the association. Thus, the representation of relational XBRL monetary items in a compositional hierarchy in terms of UML association classes is a reasonable choice. A UML class diagram implementing this approach is given in Fig. 5; a corresponding metamodel diagram is easy to draw. Please note that the usual UML notation for an association class by a dotted line decorating an association link between classes cannot be applied in our case, as we have n-ary associations from a balance sheet item to its child items. The usual remedy is to represent the association class simply as a class, take the diamond symbol representing an n-ary association and define this association as a range of a functional attribute of the association class.
axiom, see [27]);
one or more functional object properties with ranges
corresponding to the classes participating in the association; zero or more (functional or datatype) properties characterizing the association class itself.
A common example of an association class is the order class associating a customer with a product list (or service list). Both customers and products or services are themselves entities with proprietary properties. The cardinality of both associations is n : 1, i.e., an order is commissioned by exactly one customer and refers to exactly one products or services list, while, viewing the
4.3. Association classes for relational monetary items in OWL According to the ontology definition metamodel [11], these are the basic principles of representing n-ary relationships as association classes in OWL:
A relationship class in OWL is an ordinary OWL class
given specific functional properties for each attribute of the n-ary relationship. Logically, these functional properties correspond to the projections of the multi-attribute relationship to one particular attribute.
ARTICLE IN PRESS 414
M. Spies / Information Systems 35 (2010) 404–416
Fig. 5. UML class diagram with CWM stereotypes for the key metamodel elements of the XBRL ontology (simplified).
The range of each such property is the domain of the respective attribute. Using these rules, the basic representation principle underlying the representation of the relational monetary items of an XBRL reporting taxonomy can then be stated as follows:
Each atomic reporting item can be represented as an
ontology class. A relational or composed reporting item is represented as an association class. Relational or composed reporting items involving relational or composed subitems are simply represented as association classes with some or all of its participating members being association classes themselves.
These principles allow a complete ontology representation of the XBRL XML schema describing items, their contexts and their taxonomy relationships on the OWL description logic level. An example of this representation for an ontology model on the M1 level is given in Fig. 6 in UML notation for the assets section of a balance sheet taxonomy (for Germany). In order to comply with the ontology design principles entailed by the restrictions of FOL, the association classes are shown as specializations of the RelationalMonetaryItem class rather than as stereotyped model elements. Notice that the ITT principle is fully preserved in this representation as relational items representing multiple taxonomies can be easily added to the ontology keeping the identical atomic item classes. 4.4. Benefits of an ontology representation of business reporting language structures The primary benefit of the proposed ontology representation of XBRL structures (and the UML representation that is canonically associated with it) is to enable a
semantically rich processing model of reporting data in innovative business intelligence applications as discussed in the introduction. An additional benefit is the usage of inference mechanisms on reporting data. In the first place, these mechanisms will allow automated checkings of reporting data consistency and compliance in an elegant way. Second, inferencing is important in checking reporting data against advanced mathematical risk models (like in [18]) or qualitative assessments, and against business rules or regulations in general. Finally, inferencing can help to overcome a disadvantage of the linkbase representation of taxonomies as explained in the next paragraph. One technical disadvantage of the linkbase representation of an XBRL taxonomy is that the key relational properties of taxonomies, namely hierarchy level transitivity and asymmetry, are obscured. In order to test a linkbase document for these properties explicit consistency checks need to be run. Updating a linkbase implies re-checking the entire hierarchical structure. In addition, it is difficult to compare the position of items in different linkbases. Checking whether the element naming convention (like the dotted notation following hierarchy levels) is in sync with the taxonomy linkbase must be performed additionally. If the items in a US and a German XBRL instance are named differently and are organized in different taxonomies, finding out what might happen to a report of the German daughter organization of a USA-based company once data are integrated can become very difficult. Finally, if we define specific analytic applications like for risk capital assessment, the relevant formula definitions might have to be restated for each different reporting taxonomy linkbase. This contradicts the requirement by regulators to use more or less globally applicable measurement, assessment or ranking procedures. This disadvantage can be overcome if it were possible to traverse all elements of a taxonomy or conceptual hierarchy recursively in a simple way. This capability can
ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416
415
Fig. 6. Example of proposed XBRL ontology design for a specific balance sheet assets taxonomy. Relational monetary items are represented as association classes linking the corresponding subitems in the given reporting taxonomy.
be provided by logic programming constructs if a suitable representation of XBRL conceptual hierarchies were available. By defining an OWL representation [28,20] of at least part of the XBRL conceptual layer together with the fact elements, we pave the way for such applications. A powerful reasoning language for OWL ontologies is the Jena rule language, part of the open source JENA framework for Java-based RDF and OWL persistence, querying and reasoning framework. The availability of logical inferencing applied to XBRL conceptual structures also allows to answer questions for all atomic monetary items contributing to a given aggregated item.
5. Conclusion The present paper shows the importance of choosing appropriate modelling frameworks in the design of domain ontologies for real world applications. The topic here is the special case of a highly elaborate existing information structure framework, namely the XBRL specification for business reporting languages. From a simple automation point of view, transformations from
XML schema documents to ontologies could have been generated with very low effort. However, as we have seen, XBRL is developed on the basis of a profound separation of a data layer from a conceptual layer that leads to a diversity of XML schema documents. The important component of the conceptual layer to be adequately captured in an ontology is the conceptual hierarchy or taxonomy. XBRL is richer than most ontologies as it allows a multitude of taxonomies to be applied to given reporting instance data. Thus, to capture XBRL by one generic taxonomy would have completely missed the target. Therefore, we concentrated on the modelling principles governing conceptual hierarchies on one hand and aggregation hierarchies as used in data warehouse dimensions on the other hand referring to UML class models, the OMG CWM and the OMG ontology definition metamodel. As a matter of fact, XBRL taxonomies can only be understood from a common integrative perspective based on all of these modelling principles. Future work, especially integration with reasoners in the framework of MUSING services, will demonstrate the integrative capabilities of business performance
ARTICLE IN PRESS 416
M. Spies / Information Systems 35 (2010) 404–416
management integrating reporting data and other business data in a common knowledge discovery framework.
Acknowledgements This paper gives theoretical foundations for ontology design as proposed to the EU MUSING project, see http:// www.musing.eu, grant EU IP 27097, under which also this work was funded. The author is indebted to many colleagues from MUSING team partners for discussions of options and benefits of ontologies for implementing the next generation knowledge intensive financial services. I would like to name in particular Mrs Monika Jungemann-Dorner of Verband der Vereine Creditreform, Dr. Paolo Lombardi of Bank Monte dei Paschi di Siena, Drs. Thierry Declerck and H.-U. Krieger of DFKI Saarbru¨cken, Prof. Paolo Giudici of Pavia University, and Prof. Ron Kenett of KPA and Torino University. Moreover, I would like to thank the team members of the U Innsbruck team as of May 2008, especially Mag. Daniel Bachlechner, the author of XSLT scripts implementing some of the transformations defined in this document, and Dipl.-Inf. Christian Leibold. Finally, two anonymous reviewers gave me very important clues on some issues regarding the relationship of ontologies with the modelling levels in UML. References [1] Basel Committee on Banking Supervision, International convergence of capital measurement and capital standards: a revised framework—comprehensive version, Technical Report, Bank for International Settlements, 2006. [2] E. Cohen, T. Lutes, XBRL in tax and government, Public working draft, XBRL International, 2005. [3] P. Engel, M. Stanley, W. Hamscher, G. Shuetrim, D. van Kannon, H. Wallis, Extensible business reporting language (XBRL), Recommendation, XBRL International, 2003. [4] M.J. Firestone, Enterprise Information Portals and Knowledge Management, Butterworth-Heinemann, Elsevier Science, Amsterdam, Boston, 2003, kmac. [5] W. Hamscher, G. Shuetrim, Formula specification requirements, Public working draft, XBRL International, 2007. [6] I. Hernandez-Ros, H. Wallis, XBRL Dimensions 1.0, Recommendation, XBRL International, 2006.
[7] W.B. Inmon, Building the Data Warehouse, Wiley, New York, 2002. [8] F. Jouault, J. Be´zivin, KM3: a DSL for metamodel specification, in: FMOODS, 2006. [9] Object Management Group, Unified modeling language: superstructure specification, 2007. [10] Object Management Group, Unified modeling language: infrastructure, 2007. [11] Ontology definition metamodel specification, Technical Report, Object Management Group, 2006. [12] T. Declerck, H.-U. Krieger, Translating XBRL into description logic. An approach using protege, sesame and OWL, Technical Report, German Research Center for Artificial Intelligence (DFKI), 2006. [13] K. Bontcheva, C. Brewster, F. Ciravegna, H. Cunningham, L. Guthrie, R. Gaizauskas, et al., Using human language technology for acquiring, retrieving and publishing knowledge in AKT: position paper, in: Workshop on Human Language Technology and Knowledge Management, Toulouse, France, 2001 hhttp://www.elsnet.org/ acl2001-hlt+km.htmli. [14] R.S. Kenett, O. Rapaheli, Multivariate methods in enterprise system implementation, risk management and change management, Int. J. Risk Assess. Manage. 9 (3) (2008) 258–276. [15] Microsoft. MDX language reference. Available from: hhttp://technet. microsoft.com/en-us/library/ms145595.aspxi. Microsoft Corporation, 2008. [16] Mondrian Project. How to design a Mondrian schema. Available from: hhttp://mondrian.pentaho.org/documentation/schema.phpi. Pentaho Open Business Intelligence, 2008. [17] Oracle. OLAP DML reference 10 g release 2 (10.2), Technical Report, Part number B14346-01, Oracle, 2006. [18] E. Yashchin, Modeling of risk losses using size-based data, IBM J. Res. Dev. 51 (3–4) (2007) 309–323. [19] ABZ Informatik, The adaptive business reporting automat, Technical Report, XBRLOpen.org, Frauenhofer IPSI, 2007. [20] B. Motik, P. Patel-Schneider, I. Horrocks, OWL 1.1 web ontology language structural specification and functional-style syntax, 2006. [21] D. Chang, S. Iyengar, Common warehouse metamodel (CWM) specification, Object Management Group, 2001. [22] M. Spies, D. Bachlechner, Transformations of business reporting language constructs from and to ontology languages with XSLT, Technical Report, Innsbruck University, 2008. [23] M. Kay, XSL transformations (XSLT), Version 2.0, W3C World Wide Web Consortium, 2007. [24] B. Ganter, R. Wille, Formale Begriffsanalyse, Springer, Berlin, 1996. [25] M. Spies, Portalbasiertes Wissensmanagement und seine Unterstu¨tzung durch automatische Generierung von Begriffssystemen, in: H. Mandl (Ed.), Wissensmanagement, Hogrefe, Go¨ttingen, 2004, pp. 277–287. [26] J. Breuker, R. Hoekstra, A. Boer, K.v.d. Berg, G. Sartot, R. Rubino, A. Wyner, T. Bench-Capon, M. Palmirani, OWL Ontology of Basic Legal Concepts (LKIF-Core), 22 January 2007. [27] W.v.O. Quine, Mengenlehre und ihre Logik, Ullstein, Berlin, 1978. [28] S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. McGuinness, P. Patel-Schneider, L. Stein, OWL web ontology language reference, Technical Report, W3C, 2004. [29] S. DeRose, E. Maler, D. Orchard, XML linking language (XLink), Version 1.0, W3C World Wide Web Consortium, 2001.