An ontology modelling perspective on business reporting

ARTICLE IN PRESS Information Systems 35 (2010) 404–416 Contents lists available at ScienceDirect Information Systems journal homepage: www.elsevier...

Download PDF

900KB Sizes 19 Downloads 94 Views

Report

PDF Reader
Full Text

ARTICLE IN PRESS Information Systems 35 (2010) 404–416

Contents lists available at ScienceDirect

Information Systems journal homepage: www.elsevier.com/locate/infosys

An ontology modelling perspective on business reporting Marcus Spies ¨t Mu ¨t Innsbruck, Austria ¨ nchen, Digital Enterprise Research Institute, Universita Knowledge Management, Ludwig-Maximilians-Universita

a r t i c l e i n f o

Keywords: Enterprise information integration and interoperability Languages for conceptual modelling Ontological approaches to content and knowledge management Ontology-based software engineering for enterprise solutions Domain engineering

abstract In this paper, we discuss the motivation and the fundamentals of an ontology representation of business reporting data and metadata structures as deﬁned in the eXtensible business reporting language (XBRL) standard. The core motivation for an ontology representation is the enhanced potential for integrated analytic applications that build on quantitative reporting data combined with structured and unstructured data from additional sources. Applications of this kind will enable signiﬁcant enhancements in regulatory compliance management, as they enable business analytics combined with inference engines for statistical, but also for logical inferences. In order to deﬁne a suitable ontology representation of business reporting language structures, an analysis of the logical principles of the reporting metadata taxonomies and further classiﬁcation systems is presented. Based on this analysis, a representation of the generally accepted accounting principles taxonomies in XBRL by an ontology provided in the web ontology language (OWL) is proposed. An additional advantage of this representation is its compliance with the recent ontology deﬁnition metamodel (ODM) standard issued by OMG. & 2009 Elsevier B.V. All rights reserved.

1. Introduction In the present decade, several severe crises in international ﬁnancial markets have been caused by failures of ﬁnancial institutions to adequately balance credit risks with capital required to cover probable and, at least to some degree, unexpected losses. As a consequence, comprehensive legislations and regulations have been devised and implemented in order to allow banks and other ﬁnancial institutions to better assess credit risks and to allow supervisory bodies to enforce the building of adequate capital reserves. One of the best understood and implemented regulations is the Basel II accord [1]. A second major area of regulation in the ﬁnancial sector is related to business reporting. Driven by the need Corresponding author at: Institute of Informatics, LMU University of Munich, Leopoldstr. 13, D-80802 Munich, Germany. Tel.: +49 89 2180 5164. E-mail address: [email protected]

0306-4379/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.is.2008.12.003

to integrate data across countries and business sectors for adequate economic decisions on national and supranational political levels in the growing free trade areas, e.g., in Europe or in the Americas, business reporting data structures and practices have to follow a set of common representation schemes and rules. The two major efforts of regulation in this respect are the international ﬁnancial reporting standards (IFRS) and the eXtensible business reporting language (XBRL) [3]. 1.1. Problem statement The key management discipline to have emerged in reply to the increasing regulatory compliance requirements is business performance management. Well-structured and standards compliant reporting data, by themselves, represent only a one-sided approach to better business performance management for any practices, since improvement is primarily achieved in terms of adequate assessment and application of due

ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416

diligence in balancing any risks affecting the business. The full potential of risk-aware business performance management can be realized only through the integration of reporting with business analytics as it is common in existing data warehousing and data mining solutions. These solutions are part of or woven into enterprise IT infrastructures of nearly all major enterprises since many years, recently, they are being disseminated to SME speciﬁc solutions as a result of SME speciﬁc IT services strategies of most major vendors of enterprise IT applications. Now, if it comes to integrating the comparatively new business reporting standardized data with existing analytic applications from the data warehousing and data mining ﬁelds, architects for business performance management solutions face barriers related to application integration and information integration (see [4]). The main information integration barrier is that the new business reporting languages (XBRL and its extensions) are based on data taxonomies represented in a format (described in more detail below) that is not readily transferrable to or integrable with standard data warehousing schemata, e.g., the well-known star schema with its combination of a fact table with several dimension tables. The key application integration barrier is that reporting data processors are based on XML-technologies like XSLT 2.0 [23] and XQuery, while data warehouse processing (and data mining analytics building on it) is based on advanced SQL query processing, often integrated with programming language wrapping (e.g., Oracle PL/SQL). Recently, in order to overcome these barriers, the XBRL community has proposed function [5] and dimension [6] speciﬁcations that are designed to cover at least part of the data warehousing analytical solutions space by extending the scope of XBRL processing. Other developments in this area related to our problem statement are the data cube deﬁnition and manipulation language MDX (multi-dimensional expressions, see [15]) by Microsoft or the OLAP data manipulation language (DML) by Oracle [17]. These languages allow elegant formulations of data cube schema deﬁnitions and complex manipulations abstracting from the underlying database table representation structure (relational, multidimensional). In the open source community, a related language available is Mondrian that is part of the Pentaho open source business intelligence suite [16]. Mondrian allows users to deﬁne warehouse schemata interactively and generates the SQL statements needed. MDX is being integrated into the web service API XML-A (XML for analysis). Currently, neither MDX nor XML-A has an explicit relationship to XBRL. Consequently, it can be stated that while we certainly need information and application integration in the ﬁelds of standards compliant business reporting and business analytics, the current information structure standards are not matching well and the solution space on the vendor as well as on the open source side is fragmented. This is the problem which the present paper attempts to start solving. We hope that, based on the work initiated in this paper, a sufﬁciently general metamodelling perspective on business reporting languages will be established that will

405

allow to integrate XBRL dimensions and functions easily into data warehouse deﬁnition and manipulation languages. The present paper presents a ﬁrst building block towards an integrative approach to business performance management solutions that integrate standards compliant reporting and business analytics. The solution approach is to deﬁne a common metamodel in terms of a sufﬁciently rich modelling language that allows for mapping transformations between the schemata in question. This metamodel is stated here in general terms and is used to deﬁne uniﬁed modelling language (UML) class diagrams and web ontology language (OWL) ontologies as general representation formats for XBRL structures and instance data. With this step, applications integrating XBRL data with conventional warehouse data can build on usual entity relationship models derived from UML for the reporting data, see [9,11]. Moreover, with semantic technologies, an additional set of innovative services is aimed at, which will be brieﬂy described. 1.2. Context—semantics enabled business intelligence services An important motivation for investigating a transformation between XBRL reporting language structures and OWL ontologies is given by the context of the present work within the EU integrated project MUSING (MUlti-industry, semantic-based next generation business INtelliGence, see http://www.musing.eu). Speciﬁcally, MUSING proposes to use ontologies as the key integrative knowledge and data schemata representation formalism for a multitude of services, notably those based on applications that integrate text mining with traditional business intelligence applications. A striking example of the advantage of this integration is the exploitation of textual reporting items in XBRL reporting structures for extended analytical applications. These textual items often convey key information to the business analyst that remains unused in purely quantitative analytic applications. Components in MUSING using ontologies for ﬁnancial reporting are

balance

sheet data extraction from pdf documents—pdf2xbrl tool of DFKI Saarbru¨cken, see [12]; text mining for balance sheet relevant information—the GATE tool (Shefﬁeld university) provides annotations of texts from news services (ontology enabled text classiﬁcations) and extractions of speciﬁc facts from annotated texts [13]; probabilistic analysis of reporting data—statistical analysis of risks using event, cause and area taxonomies (for a general introduction, see [14], and for an example in the ﬁeld of loss analysis, see [18]).

Both pdf2xbrl and GATE end up in creating new instances in existing ontologies using textual input. This functionality is often referred to as ontology population. Thus, these tools serve as semantic ETL (extraction,

ARTICLE IN PRESS 406

M. Spies / Information Systems 35 (2010) 404–416

transformation and loading) layer in a semantics enabled business analytics architecture. The ontologies can generally be compared to data warehouse schemata; however, much more information is contained here that will be described in the relevant MUSING publications in detail. The basic purpose of MUSING is to integrate these and further components using an enterprise service bus architecture for ﬂexible and dynamically composable services in various business scenarios related to risk management. To give an impression of the potential of the approach, we mention services in the three key industry areas of MUSING together with examples of relevant textual data. Since MUSING is a large scale integrated project, each industry area services domain has a driving industry component integrator, which is mentioned in parentheses:

Financial

risk management—services supporting advanced rating and assessment procedures for banks. Textual data come from balance sheet explanatory notes and from analyst texts (Banca di Monte dei Paschi di Siena). IT operational risk management—services supporting assessment and mitigation of risks related to IT managed operation. Textual data come from written failure descriptions by customer organizations (KPA). Internationalization consulting—services supporting internationalization of operations. Textual data come from target region government and analyst reports (Verband der Vereine Creditreform).

reporting taxonomy reporting items context (reporting entity, reporting period)

legal reference calculation rules

currency (unit, ...) facts data

label—human readable descriptions of reporting ele

ments in a given natural language; calculation—aggregation rules for reporting items; presentation—hierarchies and priorities for representing reporting elements in graphical interfaces; reference—text from legal documents or regulations applicable to a given reporting item.

In older versions, an additional deﬁnition linkbase was used to represent the reporting element hierarchies. This has been merged into the presentation linkbase hierarchy recently. Fig. 1 gives an overview of this overall structure.

label

Fig. 1. Illustration of the basic structure of XBRL that builds on a distinction of the fact layer from the conceptual layer. Quoted from the ABRA toolset documentation [19].

A key element of the conceptual component of XBRL is reporting item taxonomies. These taxonomies are related to conceptual hierarchies and often correspond to aggregation hierarchies on the calculation level. For example, a company may deliver a reporting document myReportDE.xml complying with the elements required in a German speciﬁc reporting XML schema document. A small snippet of the current XML schema deﬁning the reporting elements for the generally accepted accounting principles (GAAP) in Germany looks as follows: oelement name ¼

oelement name ¼

2. XBRL—a brief tour The fundamental idea of XBRL [3] is to allow for a conceptual and physical separation of reporting facts from reporting metadata. The metadata are used to convey the conceptual meaning of reporting data items in a standardized way. Facts are represented in XBRL instance documents using XML-elements, and metadata are available via linkbases (these are documents specifying typed links between referenced elements in XML documents based on the XLink speciﬁcation, see [29]). In a full XBRL reporting metadata representation, one XML schema ﬁle is needed to deﬁne the instance data elements, and at least four linkbases are needed to deﬁne various aspects on the conceptual level:

concepts meta−data

oelement name ¼

oelement name ¼

‘‘bs" type ¼ ‘‘xbrli:stringItemType" abstract ¼ ‘‘true" substitutionGroup ¼ ‘‘xbrli:item" nillable ¼ ‘‘true" id ¼ ‘‘de-gaapci_bs" xbrli:periodType ¼ ‘‘instant"/4; ‘‘bs.ass" type ¼ ‘‘xbrli:monetaryItemType" substitutionGroup ¼ ‘‘de-gaapci:hbst.changeposition.item" nillable ¼ ‘‘true" id ¼ ‘‘de-gaapci_bs.ass" xbrli:balance ¼ ‘‘debit" xbrli:periodType ¼ ‘‘instant"/4; ‘‘bs.ass.unpaidCap" type ¼ ‘‘xbrli:monetaryItemType" substitutionGroup ¼ ‘‘de-gaapci:hbst.changeposition.item" nillable ¼ ‘‘true" id ¼ ‘‘de-gaapci_bs.ass.unpaidCap" xbrli:balance ¼ ‘‘debit" xbrli:periodType ¼ ‘‘instant"/4; ‘‘bs.ass.unpaidCap.called" type ¼ ‘‘xbrli:monetaryItemType" substitutionGroup ¼ ‘‘de-gaapci:hbst.changeposition.item" nillable ¼ ‘‘true".

A reporting document compliant with this schema like myReportDE.xml consists essentially of a ﬂat list of XML elements corresponding to the items deﬁned here. The use of such a document for analysis in terms of the metadata layer is enabled by the XBRL framework for reporting item hierarchies. This framework is based on the XLink speciﬁcation and uses linkbases to express a variety of semantic attributes and relationships of/between reporting items. Speciﬁcally, presentation (formerly additional deﬁnition) linkbases contain item–subitem relationships in an explicit format for each item/subitem pair. In current XBRL taxonomies, these relationships are stated

ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416

as parent–child links. For example, in the income statement hierarchy for German GAAP, we have opresentationArc xlink:type ¼ ‘‘arc" xlink:arcrole ¼ ‘‘http://www.xbrl.org/2003/arcrole/ parent–child" xlink:from ¼ ‘‘is.netIncome" xlink:to ¼ ‘‘is.netIncome.regular" use ¼ ‘‘optional" order ¼ ‘‘1.0"/4.

An advantage of this representation is that it allows to declare a hierarchy in an instance data transparent way. Therefore, a key feature of XBRL is that an instance document may be analyzed according to different taxonomies according to the goal of an analytic application. For example, the German GAAP XBRL taxonomy allows myReportDE.xml to be used in ﬁnancial analytic applications related to taxes calculations. Now, for the purposes of data integration and integrated analytics, say, on a European level, the same instance document myReportDE.xml can be described with a different taxonomy that might group reporting items in a different way, leave out some of them, etc. This is possible by processing myReportDE.xml with an XBRL processor that uses the taxonomy linkbase(s) relevant to the EU wide integrated analytics. An important application of this principle is the deﬁnition of ﬁnancial ratios from balance sheet items or weighted sums of them. Financial ratios are widely used by rating agencies to assess or predict company performance. An example of calculations of Basel II relevant ﬁnancial ratios from XBRL reporting items is given in Fig. 2. In principle, no changes to the structure of the instance document myReportDE.xml are needed for this and related analytic usages. In order to support localization, national language speciﬁc labels may be given to balance sheet items without changing their basic names as XML elements. The mapping of XBRL instance document elements needed for analytics on a supra-national level is enabled by an additional speciﬁcation in the XBRL family, namely the XBRL general ledger [2]. Often, as in the examples given here, the naming scheme of XBRL items is related to the taxonomy on a national level. Items in European XBRL dialects are named with taxonomy levels appended to each other in a dotted hierarchical notation. As an example, a balance sheet (a string item with name bs) has a section corresponding to assets as opposed to equities and liabilities, the item name for the assets total is then bs.ass. The assets are further decomposed, among others, into ﬁxed and current assets, leading to the item name bs.ass.fixedAss for the ﬁxed assets total, and so on. In addition, apart from integration applications, speciﬁc analytic needs arise in the context of compliance management. For example, the balance sheet instance myReportDE.xml might need to be re-analyzed for speciﬁc risk analysis indicators with a suitable linkbase that states which items compose which risk-analytic quantity. Additionally, speciﬁc computations might be needed for risk analytic quantities that take into account further reporting data outside the balance sheet, e.g.,

407

items that are part of the proﬁt and loss (PNL) statement. The deﬁnition of such computations is expressible with XBRL function documents. To sum up, by applying different taxonomy linkbases to a given XBRL instance document, many analytic applications with different scopes and purposes are possible without modifying the instance document. Henceforth, we refer to this property of XBRL compliant business reports as the instance transparency of taxonomies (ITT) property. This property has legal implications, as well, as an electronically submitted reporting document should be digitally signed by the issuing organization and should therefore not allow for uncontrolled modiﬁcations in the course of analytical evaluations. 3. Principles and design rationale of a metamodel capturing XBRL metadata In order to deﬁne the methodological principles of an ontology representation of the XBRL reporting structures appropriately, we ﬁrst need to examine the modelling language of these structures using standard modelling approaches in the context of the UML. In particular, as speciﬁed in [9,10], it is useful to distinguish at least four levels in modelling a particular domain:

The instance level ðM0Þ comprises the actual objects of

the domain in question, e.g., an XBRL reporting document submitted by a given company in a given reporting period. The instance model level ðM1Þ comprises the XBRL analytic taxonomies and calculation rules relevant for a reporting instance document, e.g., a balance sheet item hierarchy for commercial and industrial enterprises in the scope of the ﬁscal legislation in the United States. The modelling language level ðM2Þ comprises the principles of deﬁning analytic taxonomies, e.g., the set of arc types and link roles deﬁned in the XBRL speciﬁcation for representing a reporting taxonomy. This level is often referred to as the metamodel level. It typically contains model elements that are related to a general information representation approach, like entity relationship structures in database design. The modelling language deﬁnition level ðM3Þ comprises the basic object modelling language or modelling elements in terms of which the M2 level language (the metamodel) can be built. This level is generally referred to as the metametamodel level. For UML, this level contains the so-called metaobject facility, which is a set of entirely generic object oriented modelling features and services.

In practice, more than these four levels occur. This is the case in particular, if the so-called domain speciﬁc modelling languages (DSML) are needed that are more speciﬁc than, say, a general database design model. A good example of a DSML is given by logical design and query modelling languages for online analytic processing (OLAP) in the realm of data warehousing, e.g., Microsoft’s MDX. XBRL can be seen as a DSML, as well, since it speciﬁes a

408

ARTICLE IN PRESS

M. Spies / Information Systems 35 (2010) 404–416

Fig. 2. Example of the usage of balance sheet items for the calculation of capital relevant to Basel II risk management.

ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416

dedicated set of XML schema and XLink arc types and link roles for the purpose of modelling the domain of ﬁnancial reporting information structures. From the point of view of the UML modelling, XBRL clearly speciﬁes documents on the M0 and M1 levels. Instance documents are M0 level entities, whereas taxonomies and dimension deﬁnitions together constitute a set of M1 level models. The XBRL speciﬁcation is itself a metamodel (on the M2 level, in generic terms); however, it is entirely domain speciﬁc. This means that we do not have (at least to the best of the author’s knowledge) a generic modelling approach into which XBRL ﬁts. From the practical standpoint, this is not interesting if we conﬁne ourselves to using this reporting language for interaction between businesses and governmental or regulatory institutions. The general embedding of XBRL becomes interesting, however, in the context of the problem statement of the present paper. In practice, it is necessary to integrate XBRL compliant reporting with other taxonomy-based approaches, e.g., OLAP as delivered in standard business intelligence suites from many vendors. A general embedding of XBRL in object modelling is also a key for deﬁning the modelling principles of an ontology representation of XBRL reporting structures that is designed to integrate ﬁnancial reporting with information extraction from analysis reports, etc., as in the MUSING project (see above). Therefore, we now present an approach to understanding XBRL in terms of a metamodel that is part of a general modelling approach conforming with UML. Metamodels in the realm of data analysis and data warehousing can be seen as models of metadata, see [21]. Since XBRL concept taxonomies, calculation linkbases and related documents constitute an M1 model of the reporting data in XBRL instance documents, a UML conformant metamodel of these documents must capture the general classes and relationships characterizing these metadata. From the XML schema perspective, a metamodel for XBRL can be constructed using the XBRL schema documents together with the XLink schema (can be generated from the published XLink dtd). This kind of metamodel would constitute a display surface metamodel according to the deﬁnition in [8]. From the more general perspective of metadata exchange between different applications living on different platforms (like data warehousing platforms based on Mondrian, SQL DML, DMX, etc., see above) a metamodel formulation is desirable that captures the basic structure of the business reporting domain. From the modelling standards point of view, a natural general choice for a metamodel in this situation is the common warehouse metamodel (CWM) as issued by OMG [21]. CWM is a metamodel for business analytics metadata exchange. In its present version, issued before the latest UML speciﬁcation, CWM also contains a general object metamodel. For our purposes, in particular, CWM comprises an OLAP package. Therefore, viewing key XBRL structures on the M2 level in their relationship to CWM will enable to deﬁne transformations from XBRL taxonomies to speciﬁc analytic workspace structures in a speciﬁc data warehouse platform. While a full representation of XBRL in

409

terms of CWM is outside the scope of the present paper, we will dwell on monetary reporting items here and outline a metamodel of the XBRL document structures in terms of CWM. The main idea behind the representation proposed here is that the metadata in an XBRL model for monetary reporting items are tantamount to deﬁning a data cube in the data warehousing sense (see, e.g., [7]). The monetary items in a reporting document are reconstructed as a fact table with monetary variable values (characterized by metadata deﬁning context and currency). A balance sheet is composed of two dimensions according to this metamodel, namely those corresponding to the assets and the equities/liabilities subsections of the reporting document. All discoverable taxonomies in an XBRL taxonomy linkbase applicable to the reporting document are captured as additional dimensions of the analytic data cube. More additional dimensions can be constructed from the so-called context data of an XBRL instance document referring to reporting period, reporting company, its industry sector, and so on. They can be addressed on the speciﬁc modelling level M1 using the aforementioned XBRL dimension speciﬁcation. The hierarchy in a warehouse dimension for monetary reporting items is represented in XBRL by the parent–child relationships stated in a taxonomy linkbase document. A fact table may contain additional numeric and/or textual attributes. It is important to adequately capture a conceptual hierarchy or taxonomy component of XBRL in a data warehouse metamodel view. This conceptual hierarchy is not just a business taxonomy. Business taxonomies and vocabularies are captured in CWM in the business nomenclature package that lives separately from the OLAP package. XBRL taxonomies are to be characterized as a compositional hierarchies rather than pure conceptual hierarchies, since, as we saw, values of many monetary reporting items on higher hierarchy layers are obtained from aggregation functions applied to subordinate item values. The following deﬁnitions from [21] apply precisely to XBRL monetary item hierarchies: A Dimension has zero or more Hierarchies. A Hierarchy is an organizational structure that describes a traversal pattern through a Dimension, based on parent/child relationships between members of a Dimension. Hierarchies are used to deﬁne both navigational and consolidation/computational paths through the Dimension (i.e., a value associated with a child member is aggregated by one or more parents). In particular, CWM distinguishes level hierarchies from value hierarchies. While the ﬁrst possess a well-deﬁned number of layers, the second can be expressed only in a DAG (directed acyclic graph) linking parent and child elements. Common examples of level hierarchies include region maps with each level corresponding to a regional entity containing entities of the next lower level (e.g., a city contained in a region contained a country). Typically, there are several members at each level of a level-based hierarchy. Again, citing from [21], we have: A ValueBasedHierarchy deﬁnes a hierarchical ordering of members in which ythe topological structure of the

ARTICLE IN PRESS 410

M. Spies / Information Systems 35 (2010) 404–416

Fig. 3. Basic relationships between OMG common warehouse metamodel (CWM) elements and eXtensible business reporting language (XBRL) modelling structures. XBRL structures are deﬁned in terms of corresponding CWM metamodel stereotypes where applicable. Note that not all XBRL structures mentioned here are explicitly deﬁned in a set of XBRL domain speciﬁc metadata documents. For convenience, XBRL ﬁle structures deﬁned to contain the related information for a speciﬁc XBRL reporting model are also contained in the diagram (for further explanations, see text).

hierarchy conveys meaning. yValueBasedHierarchy can be used to model pure linked node hierarchies (e.g., asymmetric hierarchical graphs or parent-child tables). LevelBasedHierarchy contains an ordered collection of HierarchyLevelAssocations that deﬁnes the natural hierarchy of the Dimension. The ordering deﬁnes the hierarchical structure in top-down fashion y. While the speciﬁcation of an XBRL taxonomy using XLink in an XML linkbase document evidently uses a topological representation in terms of a directed acyclic parent–child graph, the general representation of such a taxonomy is not limited to this topology. In fact, the naming of reporting item elements using concatenation of dotted level names (as demonstrated) corresponds to a level-based rather than value-based hierarchy. Similarly, for many implementation purposes, a general representation in terms of a level-based hierarchy is more apt to the situation since it puts the aggregation or consolidation steps used by statistical analysis of XBRL documents in evidence. Fig. 3 summarizes the key metamodel elements that are useful for a data warehouse conforming the view of key XBRL metadata structures. For example, an XBRL reporting taxonomy (physically, usually a presentation linkbase ﬁle) is stated to correspond to a dimension deployment within a cube dimension (not shown). The single arcs in such a linkbase correspond to a StructureMap in CWM that deﬁnes the hierarchical structure of the dimension by parent/child relationships. The elements between which such relationships are stated are given in an XML schema ﬁle listing the reporting elements by name, type and some further metadata.

The structured name metaclass generalizes the two naming schemes used for XBRL reporting items based on short names like bs for balance sheet. One naming scheme uses dotted notation, the other one camel case capitalization. Technically, Fig. 3 is based on UML 2.1 proﬁle representation of the CWM metamodel (a UML 1.4 model in the current speciﬁcation). The stereotypes from this proﬁle are here used for the metamodelling or M2 level which amounts to specifying XBRL as a domain speciﬁc modelling language based on a CWM proﬁle. The UML 2.1 representation of CWM was generated using Eclipse EMF, and the transformation to a UML proﬁle was realized with MagicDraw 15.5.

4. Ontology representation of XBRL accounting structures Understanding XBRL reporting data models in terms of data warehousing metamodel structures paves the way for a further important development. While the CWM conforming DSML from the previous section explicitly refers to speciﬁc XBRL reporting metadata structures, a generalization is possible and reasonable. Many dimensions relevant to analyzing XBRL data beyond the single enterprise level are not captured by reporting item taxonomies. A simple example for an additional dimension is the type of economic activities of a company; another example is a regional consolidation of balance sheet ﬁgures. In the XBRL community, a dedicated speciﬁcation of such additional analytic dimensions has

ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416

been given in the XBRL dimensions speciﬁcation [6] which deﬁnes additional document structures. A natural alternative to such a speciﬁcation can be given by in terms of a general approach conforming to the CWM metamodel. In fact, since the XBRL reporting hierarchies can be seen as platform speciﬁc deﬁnitions of general CWM data warehousing concepts related to data cubes and dimensions, a common representation of reporting metadata with other dimensional metadata for company analysis is reasonable. Such a common representation can suitably be given by an ontology for reporting items that abstracts from the XML document structure deﬁned in the XBRL speciﬁcations while maintaining the full support of taxonomies and their dimensional hierarchies. In this section, we deﬁne the essential guiding principles for an ontology representation in OWL 1.1 [20] of business reporting taxonomies and their instance data. As for the earlier work on XBRL ontologies, the reader is referred to [12]. While the approach in [12] has a few features in common with the one presented here, we focus on a data warehousing model [21] compliant basic structure of the ontology. From the modelling perspective, an ontology serves both on the M2 and on the M1 levels. Since ontologies are bound to the restrictions imposed by ﬁrst order logic (FOL), repeated steps of instantiation as they are used in the metamodel hierarchy of the model driven architecture are not possible. As a remedy, it has become popular to separately build basic or upper ontologies for the M2 level and derive from their modelling elements the speciﬁc classes corresponding to the M1 level. These different ontologies can be separated symbolically by using different namespaces and physically by using different ﬁles or database repositories. The basic purpose of an ontology representation of XBRL metadata is to provide an integrated representation of all XBRL metadata, in particular reporting item taxonomies. The beneﬁt is a common view on reporting metadata together with reporting entity (e.g., company) metadata for the purposes of general analysis and decision-making on all levels, from the corporate level up to supranational economic indicator assessment. While this beneﬁt can also be obtained by a UML model, an ontology representation offers additional beneﬁts due to the reasoning capabilities of inference engines operating on ontologies. These reasoning capabilities help to go beyond the logical data integration view to the knowledge integration view that is characterized by processing heterogeneous datasets together with representations of business rules or regulations. The guiding principles of the ontology representation of XBRL taxonomies are

model transformation of taxonomies—there must be a

metamodel-based translation from the XBRL taxonomy deﬁnition to suitable ontology constructs. Instance to instances transformation—data from an XBRL instance document must be represented as ontology instance or instances. If the representation uses multiple instances, the common context (report-

411

ing period, reporting entity identiﬁers) must be retrievable via suitable properties. Aggregation rules separation—reporting elements can be aggregated or used in formulae to provide indicators, and these rules are expressed in a separate rules layer and executed using an inference engine. Unstructured (e.g., legal) references management— textual reporting data are represented as string properties in the ontology; textual metadata like legal references can be stored in annotations to an ontology class.

Note that, from a simple automation point of view, transformations from XML schema documents to ontologies could be generated with very low effort. The TopQuadrant tool TopBraid offers a nice component to do this. However, as we have seen, XBRL is developed on the basis of a profound separation of a data layer from a conceptual layer that leads to a diversity of XML schema documents. In this paper, we base our ontologies on the OWL [20]. The main task for an ontology representation of XBRL accounting data and metadata in the OWL is to capture the compositional value hierarchy as expressed in the CWM-proﬁle-based metamodel of the preceding section. In developing the approach we also ensure that the results will be reusable in a general context by enforcing compliance of our ontologies with the recent ontology deﬁnition metamodel (ODM, see [11]) issued by OMG. As a prerequisite, an ontology representation of the basic XBRL structures like a reporting item element and its context (which company is reporting, what time frame or which point in time does the item refer to) has to be given. As shown in [12], already on this level we have many choices. For the present work, we made a few simple adaptions documented in [22] that allow for a rather straightforward representation of these basic structures compliant with the description logic level of OWL.

4.1. Methodology for ontology engineering of reporting taxonomies The XBRL reporting taxonomies are described in the speciﬁcation as conceptual taxonomies. From a general business taxonomy point of view, this description is sufﬁciently precise. The core property of a conceptual taxonomy is a hierarchy of super- and subconcepts. As multiple hierarchies involving the same conceptual entities may be used for different analytical purposes, reporting taxonomies actually deﬁne poly-hierarchies (also called heterarchies) relating the conceptual entities. The mathematical structure of poly-hierarchical conceptual systems has been studied in [24]. Many concept-based search engines (like wissen.de in Germany) make use of the elegant lattice properties of poly-hierarchies. A web-based user interface for constructing poly-hierarchies from entity-property cross-tables has been developed and proposed as knowledge management tool in [25].

ARTICLE IN PRESS 412

M. Spies / Information Systems 35 (2010) 404–416

For clarity, in the sequel, we will use the term taxonomy in a restricted sense representing tree-shaped hierarchies. Poly-hierarchies are assumed to be expressed by multiple taxonomies. Technically, an XBRL linkbase may contain a poly-hierarchy, but this is not a common practice, as taxonomy linkbases are used in conjunction with presentation linkbases for generating tree views in user interfaces to XBRL processors. If a conceptual polyhierarchy is needed, this can always be accomplished by applying multiple taxonomies to a reporting element item schema. In the sequel, we use the concept aggregation function. An aggregation function takes one or more arguments as input and combines them into a single valued output. A simple example of a numeric aggregation function is summation—this is actually the prevailing aggregation function in business reporting. Note that this concept of an aggregation function corresponds to the usual meaning of aggregation in the context of DML and should not be confused with the concept of aggregation in UML structural modelling [9]. From an ontology deﬁnition point of view, conceptual taxonomies usually can be represented as class/subclass relationships. However, for the XBRL reporting taxonomies, this usual approach is inappropriate for several reasons: 1. Values of reporting items on lower levels of the taxonomy are successively aggregated (often by summation) to generate the values of reporting items on higher taxonomy levels. We will refer to this property henceforth as functional value aggregation across the hierarchy (VAH). As a consequence of VAH, the relationship between reporting items on successive taxonomy levels does not correspond exactly to a class hierarchy. In terms of UML class diagrams, a conceptual entity on a higher (closer to the root elements) level of the taxonomy is rather a composition of lower level entities than their superclass. 2. the ITT property (instance transparency of taxonomies, as explained earlier) requires that taxonomy hierarchies can have a different structure for identical reporting items. While inversion of hierarchical relationships between taxonomies is highly unlikely, an ontology deﬁnition of a reporting taxonomy should not a priori exclude possible reorganizations of hierarchies in different taxonomies. Consequently, an ontology representation of XBRL reporting concept taxonomies should not use a simple class hierarchy. Instead, it is proposed to distinguish between two basic levels of reporting items:

derived by an aggregation function deﬁned on the collection of next lower level reporting items. Thus, the representation of reporting taxonomies in an ontology needs a formalism for compositional hierarchies. This need arises very often in ontology engineering, e.g., in anatomy and in spatial object modelling. These and other application domains have motivated the formalization of mereologies, or simply part-of-relationship ontologies. A recent formulation of a mereology can be found in the LKIF ontology of the EU Estrella project [26]. A key element of any mereology is the part relation (or object property, in OWL parlance). The part (read as hasPart) relation is transitive (parts of parts of an object are parts of the object) and asymmetric (if a has part b, then b does not have part a). Objects that have no parts are atoms in the mereology, and objects that have at least one part are wholes. In many applications, we require a non-transitive subrelation of part. Basically, non-transitivity means that parts compose a whole only on some given level of an aggregation or composition (e.g., a roof tile is part of a roof, but not by itself part of a house). For these nontransitive relations, LKIF has the subrelation strictPart (again, read this as hasStrictPart). As an aside, note that all mereology relations in an OWL ontology are OWL object properties which are extensionally interpreted as sets of ordered pairs. Subrelations are interpreted as subsets of such sets. Therefore, it is legitimate to have a non-transitive subproperty of a transitive property. The converse relations of the part and strictPart relations are part_of and strictPart_of, respectively. Moreover, the LKIF mereology deﬁnes member and composition relations as subrelations of strictPart. The intended meaning here of the composition relation is that each step in an iterated composition relationship usually involves a combination operation (like arithmetic aggregation in the case of monetary reporting items). The additional composition operation is the reason why UML speciﬁes the composition metaassociation distinguishing it from mere aggregational part-of-relationships as we ﬁnd them in spatial objects. Analogously, the member relation corresponds to a mereological relationship in the set-theoretic sense since set membership is not transitive, see [27]. In order to allow for a special treatment of spatial relationships, the LKIF ontology speciﬁes spatial containment and spatial composition relationships. A mereology can be used as upper ontology to an ontology representation for an XBRL reporting taxonomy in an apparently straightforward way:

atomic reporting items—to be represented as mereology atomic reporting items—these are reporting items

corresponding to leaf elements of a taxonomy hierarchy tree; composed or relational reporting items—these are reporting items not corresponding to leaf elements of a taxonomy hierarchy tree. In many cases, the monetary values in composed reporting items are

atoms;

composed or relational reporting items—to be represented as mereology wholes. Fig. 4 shows the ontological foundation for representing an atomic monetary item using the LKIF mereology framework.

ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416

413

Fig. 4. Mereology foundation of ontology representation of an XBRL monetary item (for further explanation, see text).

The remaining question to answer is how to deﬁne appropriate specializations of the part or strictPart relationships for such an ontology. In order to explain a reasonable choice, we need to digress to the recent OMG ontology deﬁnition metamodel (ODM) speciﬁcation. 4.2. Relational monetary items and association classes in UML If subclassing is inappropriate for representing the compositional value hierarchy relationships in XBRL, a suitable approach can be derived from the UML [9] notion of an association class that combines modelling features of a class with those of an association. If we use in particular an association related to a parent–child relationship, a suitable UML formulation for relational monetary XBRL items can be derived. An association class is deﬁned by

a class axiom (or, in logical terms, a comprehension

association from the participants sides, a customer may place multiple orders and a products or services list can appear in multiple orders. In particular, an association class may have a member slot (or property) that corresponds to the result of an aggregation function applied to the values of suitable properties of the classes participating in the association. Thus, the representation of relational XBRL monetary items in a compositional hierarchy in terms of UML association classes is a reasonable choice. A UML class diagram implementing this approach is given in Fig. 5; a corresponding metamodel diagram is easy to draw. Please note that the usual UML notation for an association class by a dotted line decorating an association link between classes cannot be applied in our case, as we have n-ary associations from a balance sheet item to its child items. The usual remedy is to represent the association class simply as a class, take the diamond symbol representing an n-ary association and deﬁne this association as a range of a functional attribute of the association class.

axiom, see [27]);

one or more functional object properties with ranges

corresponding to the classes participating in the association; zero or more (functional or datatype) properties characterizing the association class itself.

A common example of an association class is the order class associating a customer with a product list (or service list). Both customers and products or services are themselves entities with proprietary properties. The cardinality of both associations is n : 1, i.e., an order is commissioned by exactly one customer and refers to exactly one products or services list, while, viewing the

4.3. Association classes for relational monetary items in OWL According to the ontology deﬁnition metamodel [11], these are the basic principles of representing n-ary relationships as association classes in OWL:

A relationship class in OWL is an ordinary OWL class

given speciﬁc functional properties for each attribute of the n-ary relationship. Logically, these functional properties correspond to the projections of the multi-attribute relationship to one particular attribute.

ARTICLE IN PRESS 414

M. Spies / Information Systems 35 (2010) 404–416

Fig. 5. UML class diagram with CWM stereotypes for the key metamodel elements of the XBRL ontology (simpliﬁed).

The range of each such property is the domain of the respective attribute. Using these rules, the basic representation principle underlying the representation of the relational monetary items of an XBRL reporting taxonomy can then be stated as follows:

Each atomic reporting item can be represented as an

ontology class. A relational or composed reporting item is represented as an association class. Relational or composed reporting items involving relational or composed subitems are simply represented as association classes with some or all of its participating members being association classes themselves.

These principles allow a complete ontology representation of the XBRL XML schema describing items, their contexts and their taxonomy relationships on the OWL description logic level. An example of this representation for an ontology model on the M1 level is given in Fig. 6 in UML notation for the assets section of a balance sheet taxonomy (for Germany). In order to comply with the ontology design principles entailed by the restrictions of FOL, the association classes are shown as specializations of the RelationalMonetaryItem class rather than as stereotyped model elements. Notice that the ITT principle is fully preserved in this representation as relational items representing multiple taxonomies can be easily added to the ontology keeping the identical atomic item classes. 4.4. Beneﬁts of an ontology representation of business reporting language structures The primary beneﬁt of the proposed ontology representation of XBRL structures (and the UML representation that is canonically associated with it) is to enable a

semantically rich processing model of reporting data in innovative business intelligence applications as discussed in the introduction. An additional beneﬁt is the usage of inference mechanisms on reporting data. In the ﬁrst place, these mechanisms will allow automated checkings of reporting data consistency and compliance in an elegant way. Second, inferencing is important in checking reporting data against advanced mathematical risk models (like in [18]) or qualitative assessments, and against business rules or regulations in general. Finally, inferencing can help to overcome a disadvantage of the linkbase representation of taxonomies as explained in the next paragraph. One technical disadvantage of the linkbase representation of an XBRL taxonomy is that the key relational properties of taxonomies, namely hierarchy level transitivity and asymmetry, are obscured. In order to test a linkbase document for these properties explicit consistency checks need to be run. Updating a linkbase implies re-checking the entire hierarchical structure. In addition, it is difﬁcult to compare the position of items in different linkbases. Checking whether the element naming convention (like the dotted notation following hierarchy levels) is in sync with the taxonomy linkbase must be performed additionally. If the items in a US and a German XBRL instance are named differently and are organized in different taxonomies, ﬁnding out what might happen to a report of the German daughter organization of a USA-based company once data are integrated can become very difﬁcult. Finally, if we deﬁne speciﬁc analytic applications like for risk capital assessment, the relevant formula deﬁnitions might have to be restated for each different reporting taxonomy linkbase. This contradicts the requirement by regulators to use more or less globally applicable measurement, assessment or ranking procedures. This disadvantage can be overcome if it were possible to traverse all elements of a taxonomy or conceptual hierarchy recursively in a simple way. This capability can

ARTICLE IN PRESS M. Spies / Information Systems 35 (2010) 404–416

415

Fig. 6. Example of proposed XBRL ontology design for a speciﬁc balance sheet assets taxonomy. Relational monetary items are represented as association classes linking the corresponding subitems in the given reporting taxonomy.

be provided by logic programming constructs if a suitable representation of XBRL conceptual hierarchies were available. By deﬁning an OWL representation [28,20] of at least part of the XBRL conceptual layer together with the fact elements, we pave the way for such applications. A powerful reasoning language for OWL ontologies is the Jena rule language, part of the open source JENA framework for Java-based RDF and OWL persistence, querying and reasoning framework. The availability of logical inferencing applied to XBRL conceptual structures also allows to answer questions for all atomic monetary items contributing to a given aggregated item.

5. Conclusion The present paper shows the importance of choosing appropriate modelling frameworks in the design of domain ontologies for real world applications. The topic here is the special case of a highly elaborate existing information structure framework, namely the XBRL speciﬁcation for business reporting languages. From a simple automation point of view, transformations from

XML schema documents to ontologies could have been generated with very low effort. However, as we have seen, XBRL is developed on the basis of a profound separation of a data layer from a conceptual layer that leads to a diversity of XML schema documents. The important component of the conceptual layer to be adequately captured in an ontology is the conceptual hierarchy or taxonomy. XBRL is richer than most ontologies as it allows a multitude of taxonomies to be applied to given reporting instance data. Thus, to capture XBRL by one generic taxonomy would have completely missed the target. Therefore, we concentrated on the modelling principles governing conceptual hierarchies on one hand and aggregation hierarchies as used in data warehouse dimensions on the other hand referring to UML class models, the OMG CWM and the OMG ontology deﬁnition metamodel. As a matter of fact, XBRL taxonomies can only be understood from a common integrative perspective based on all of these modelling principles. Future work, especially integration with reasoners in the framework of MUSING services, will demonstrate the integrative capabilities of business performance

ARTICLE IN PRESS 416

M. Spies / Information Systems 35 (2010) 404–416

management integrating reporting data and other business data in a common knowledge discovery framework.

Acknowledgements This paper gives theoretical foundations for ontology design as proposed to the EU MUSING project, see http:// www.musing.eu, grant EU IP 27097, under which also this work was funded. The author is indebted to many colleagues from MUSING team partners for discussions of options and beneﬁts of ontologies for implementing the next generation knowledge intensive ﬁnancial services. I would like to name in particular Mrs Monika Jungemann-Dorner of Verband der Vereine Creditreform, Dr. Paolo Lombardi of Bank Monte dei Paschi di Siena, Drs. Thierry Declerck and H.-U. Krieger of DFKI Saarbru¨cken, Prof. Paolo Giudici of Pavia University, and Prof. Ron Kenett of KPA and Torino University. Moreover, I would like to thank the team members of the U Innsbruck team as of May 2008, especially Mag. Daniel Bachlechner, the author of XSLT scripts implementing some of the transformations deﬁned in this document, and Dipl.-Inf. Christian Leibold. Finally, two anonymous reviewers gave me very important clues on some issues regarding the relationship of ontologies with the modelling levels in UML. References [1] Basel Committee on Banking Supervision, International convergence of capital measurement and capital standards: a revised framework—comprehensive version, Technical Report, Bank for International Settlements, 2006. [2] E. Cohen, T. Lutes, XBRL in tax and government, Public working draft, XBRL International, 2005. [3] P. Engel, M. Stanley, W. Hamscher, G. Shuetrim, D. van Kannon, H. Wallis, Extensible business reporting language (XBRL), Recommendation, XBRL International, 2003. [4] M.J. Firestone, Enterprise Information Portals and Knowledge Management, Butterworth-Heinemann, Elsevier Science, Amsterdam, Boston, 2003, kmac. [5] W. Hamscher, G. Shuetrim, Formula speciﬁcation requirements, Public working draft, XBRL International, 2007. [6] I. Hernandez-Ros, H. Wallis, XBRL Dimensions 1.0, Recommendation, XBRL International, 2006.

[7] W.B. Inmon, Building the Data Warehouse, Wiley, New York, 2002. [8] F. Jouault, J. Be´zivin, KM3: a DSL for metamodel speciﬁcation, in: FMOODS, 2006. [9] Object Management Group, Uniﬁed modeling language: superstructure speciﬁcation, 2007. [10] Object Management Group, Uniﬁed modeling language: infrastructure, 2007. [11] Ontology deﬁnition metamodel speciﬁcation, Technical Report, Object Management Group, 2006. [12] T. Declerck, H.-U. Krieger, Translating XBRL into description logic. An approach using protege, sesame and OWL, Technical Report, German Research Center for Artiﬁcial Intelligence (DFKI), 2006. [13] K. Bontcheva, C. Brewster, F. Ciravegna, H. Cunningham, L. Guthrie, R. Gaizauskas, et al., Using human language technology for acquiring, retrieving and publishing knowledge in AKT: position paper, in: Workshop on Human Language Technology and Knowledge Management, Toulouse, France, 2001 hhttp://www.elsnet.org/ acl2001-hlt+km.htmli. [14] R.S. Kenett, O. Rapaheli, Multivariate methods in enterprise system implementation, risk management and change management, Int. J. Risk Assess. Manage. 9 (3) (2008) 258–276. [15] Microsoft. MDX language reference. Available from: hhttp://technet. microsoft.com/en-us/library/ms145595.aspxi. Microsoft Corporation, 2008. [16] Mondrian Project. How to design a Mondrian schema. Available from: hhttp://mondrian.pentaho.org/documentation/schema.phpi. Pentaho Open Business Intelligence, 2008. [17] Oracle. OLAP DML reference 10 g release 2 (10.2), Technical Report, Part number B14346-01, Oracle, 2006. [18] E. Yashchin, Modeling of risk losses using size-based data, IBM J. Res. Dev. 51 (3–4) (2007) 309–323. [19] ABZ Informatik, The adaptive business reporting automat, Technical Report, XBRLOpen.org, Frauenhofer IPSI, 2007. [20] B. Motik, P. Patel-Schneider, I. Horrocks, OWL 1.1 web ontology language structural speciﬁcation and functional-style syntax, 2006. [21] D. Chang, S. Iyengar, Common warehouse metamodel (CWM) speciﬁcation, Object Management Group, 2001. [22] M. Spies, D. Bachlechner, Transformations of business reporting language constructs from and to ontology languages with XSLT, Technical Report, Innsbruck University, 2008. [23] M. Kay, XSL transformations (XSLT), Version 2.0, W3C World Wide Web Consortium, 2007. [24] B. Ganter, R. Wille, Formale Begriffsanalyse, Springer, Berlin, 1996. [25] M. Spies, Portalbasiertes Wissensmanagement und seine Unterstu¨tzung durch automatische Generierung von Begriffssystemen, in: H. Mandl (Ed.), Wissensmanagement, Hogrefe, Go¨ttingen, 2004, pp. 277–287. [26] J. Breuker, R. Hoekstra, A. Boer, K.v.d. Berg, G. Sartot, R. Rubino, A. Wyner, T. Bench-Capon, M. Palmirani, OWL Ontology of Basic Legal Concepts (LKIF-Core), 22 January 2007. [27] W.v.O. Quine, Mengenlehre und ihre Logik, Ullstein, Berlin, 1978. [28] S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. McGuinness, P. Patel-Schneider, L. Stein, OWL web ontology language reference, Technical Report, W3C, 2004. [29] S. DeRose, E. Maler, D. Orchard, XML linking language (XLink), Version 1.0, W3C World Wide Web Consortium, 2001.

An ontology modelling perspective on business reporting

An ontology modelling perspective on business reporting

Recommend Documents