Quality assessment of coupled civil engineering applications

Quality assessment of coupled civil engineering applications

Advanced Engineering Informatics 25 (2011) 625–639 Contents lists available at SciVerse ScienceDirect Advanced Engineering Informatics journal homep...

2MB Sizes 0 Downloads 96 Views

Advanced Engineering Informatics 25 (2011) 625–639

Contents lists available at SciVerse ScienceDirect

Advanced Engineering Informatics journal homepage: www.elsevier.com/locate/aei

Quality assessment of coupled civil engineering applications Toni Fröbel a,⇑, Berthold Firmenich b, Christian Koch c a

Bauhaus-Universität Weimar, Berkaerstraße 9, 99423 Weimar, Germany CADEMIA-Consult GmbH, Am Lotzenwald 58, 65719 Hofheim am Taunus, Germany c Institute for Computational Engineering, Ruhr-Universität Bochum, Universitätstraße 150, 44801 Bochum, Germany b

a r t i c l e

i n f o

Article history: Available online 7 September 2011 Keywords: Data exchange Quality assessment Uncertainty Coupling BIM Data mapping

a b s t r a c t The software scenery in civil engineering is characterized by a large number of more or less specialized software applications for different tasks. To solve the tasks efficiently, each software application has its own appropriate and optimized data structure. The variety of software tools used to support the design process leads to an exchange of data and information between the involved engineers and their software applications. An exchange of data and information can be achieved by schema mapping and has been an active subject of research during the last decade. However, due to the incompatible data schemas, loss of data and information may occur and therefore needs to be quantified. Current evaluation processes mainly operate on the data and work a posteriori. The changes of data and information resulting from inadequate data mappings between data schemas of software applications to be coupled are identified either by visual inspection or via file comparison and are classified according to certain criteria. Then, the changes have to be qualitatively evaluated by the user. In this paper, a generic a priori approach to assess coupling quality is introduced. Software coupling in computer science refers to the ability to enable software applications to work together, and thus to achieve a common objective. This can be achieved by a data exchange and means the transfer of needed data between the coupled software. The quality of the coupling depends on the quantity and the accuracy of data to be transferred. The formalism to assess coupling quality is described mathematically including set theory and graph theory. This approach operates on the involved schemas, is not limited to a common data exchange format, and takes into account various mapping patterns. Moreover, the coupling quality is evaluated in the formalization process, which results in a global quality value. This quality value can then be used directly by the user to assess the data exchange. A synthetic scenario from civil engineering is used to illustrate the formalization process. Finally, the applicability of the proposed approach to assess coupling quality is shown within a real world case study. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction The planning process of buildings is highly complex and not manageable in its entirety. The state of the art is the decomposition of the complex task into smaller and manageable ones, which then can be solved separately and concurrently by different engineers. However, this leads to a lot of communication between the engineers and consequentially between the more or less specialized software applications, which can be used by engineers to solve the smaller tasks. Due to the semantic dependence of the partial tasks the software must also be semantically coupled, e.g. building elements as well as their interrelations have to be transferred between the coupled applications and must be interpreted in the same way by these applications. Nevertheless, this is quite challenging as the coupling solution depends on several different factors. Software applications use their ⇑ Corresponding author. Tel.: +49 3643 584110. E-mail address: [email protected] (T. Fröbel). 1474-0346/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.aei.2011.08.005

own data structures, which are optimized to solve the task. In addition, these software applications might run on different operating systems and computers or might be implemented by different programming languages. These languages might also be based on different programming paradigms. Furthermore, software applications are influenced by the rapid evolution in computer science. The development of software applications is a continuous process. New software technologies and paradigms are used to improve and to extend functionality. For example, nowadays, the object-oriented concept is widely used for the development of software applications and thus the definition of data exchange schemas has changed from simple text files to product data models which are structured by complex type inheritance hierarchies. Thus, various coupling strategies are needed and have to be adjusted to the new software concepts. One widespread strategy in civil engineering is the data structure coupling where software applications are coupled on the basis of their internal data structures. Therefore, several data mappings between the used data structures are required. An error free

626

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

exchange of digital data is one of the most important factors to enable collaborative and well distributed working across the borders of disciplines and organizations. Teeuw et al. [1] concluded that it is worthwhile to use international, open standards for data communication and exchange. Due to incompatible data schemas and the many data mappings, a perfect semantic interoperability cannot be expected [2]. Gielingh [3] added that, with the current generation of product data technology (PDT) standards a loss of data or meaning can hardly be avoided. Other researchers highlighted the difficulties of semantic interoperability between applications with different internal schemas [4–8]. Vergeest and Horvath [9] went further to distinguish between different types of interoperability issues in an information exchange transaction. The evaluation of semantic interoperability between software applications has been an active subject of research during the last decade. The SPADEX [10] and PM4D [11] report documented a loss of information and a misrepresentation of geometry within the data exchange of Industry Foundation Classes (IFC) certificated software applications. Similar results are presented in [12–14]. Ma et al. [15] specified a number of changes that occur during the data exchange – entities appeared, entities disappeared or changed. Further studies and data interoperability benchmark test confirmed these results [16–18]. However, the design material in engineering practice (civil, mechanical and industrial engineering) is very sensitive. Lost or incorrect data can lead to numerous problems. Therefore, the assessment of the coupling quality, i.e. the quality of the mapping, is important and necessary. Current evaluation processes mainly operate on the data and work a posteriori. The changes are identified by visual inspection or via file comparison and they are listed according to certain criteria such as physical file size, differing number of instances, inconsistent object types and attribute values, or schema inconsistencies [15]. Differences in the physical file size or in the number of instances are not a strong indicator to assess the coupling quality. Inconsistent object types and attribute values are only important for an assessment of the coupling quality if they are needed in the software application to be coupled and if a loss would influence the coupling results. Thus, each listed change has to be qualitatively evaluated by the user. Furthermore, the data exchange must occur via a common data format, mostly a standardized format; and the single objects must be identified, mostly by a global ID. In this paper, a generic a priori approach to assess coupling quality is introduced. This approach operates on the involved schemas and is neither limited to a common data format nor restricted to civil engineering applications. The coupling quality is evaluated in the formalization process, which results in global quality values and indicators. These can be used directly by the user to assess the data exchange. This paper is structured as follows: Section 2 presents a review of related work in testing and

assessment of data structure based coupling; Section 3 introduces a synthetic scenario from civil engineering, which is further used in the subsequent sections; Section 4 describes the formalization of the schema mapping; Section 5 addresses the assessment of the schema mapping; Section 6 presents a real world case study; Section 7 includes conclusions and Section 8 addresses future directions and challenges. 2. Related work Current evaluation approaches for semantic interoperability work a posteriori and operate on available data. The evaluation process can be done by visual inspection or by file comparison. Visual inspection means the detection of changes on data directly on the screen or on printed drawings by the engineer. It is evident that visual inspection can only detect major problems, like missing or misrepresented elements. The file comparison approach – described in Section 2.1 – can detect further minor changes in data, but is limited to certain conditions. To eliminate the most disadvantages of the previous approaches an a priori approach – introduced in Section 2.2 – can be applied. 2.1. File comparison – a posteriori approach File comparison in general is an approach to assess the quality of data exchange for a given set of instances. This method operates directly on the data. The assessment process itself is based on the data exchange (Fig. 1) via files. In a first step the instances of application A – which are to be transferred – have to be exported to an external file. In the next step the external file has to be imported into application B. Following these two steps, the objects have to be transferred back to application A, respectively. During these steps no changes are made on the data [15]. In theory, the whole import and export process should be lossless, and data should not be modified. Nevertheless, there is a loss of data, but this loss is inevitable. In a final step, the changes can be identified via file comparison of both external files. The changes can be automatically categorized according to certain criteria such as the physical file size, the differing numbers of instances, inconsistent object types and attributes. An advantage of this approach is that only the standard schema must be known. This leads to a fast implementation, but it cannot be used for each data transfer and coupling scenario. To utilize this approach some conditions must be fulfilled. First of all, the data exchange must obey a common data format, preferably a standard. In addition, to compare objects between the import and the export file, they have to be unique and they must be distinguishable from another. Thus, the single objects within the standard schema must be identifiable, mostly by a global ID. Furthermore, the whole

Fig. 1. File based data exchange via a standard schema.

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

process of mapping and file comparison is time and resource consuming and the results have to be qualitatively evaluated by the user. In the last 30 years several exchange standards of digital product data like ISO 10303 (STEP) or ISO/PAS 16739 (IAI/IFC) have been developed. Particularly in the construction sector, the use of the IFC standard leads to a new trend – the so called Building Information Modeling (BIM). A research project at the University of Auckland developed a special software system called EVASYS (EXPRESS Evaluation System) that allows for evaluation of similarities and differences between two IFC models under the EXPRESS schema [15]. EVASYS provides a quick and easy way to compare two IFC-files automatically according to different object types and their corresponding attribute values. The differences are listed in different categories. However, an overall qualitative assessment of the data exchange is missing. The engineer eventually has to go through the whole list of differences and has to judge manually whether the task can be accomplished on this basis or not. Furthermore, EVASYS can only compare two EXPRESS files. 2.2. Evaluated schema mapping – an a priori approach Schema mapping (Fig. 2) is widely used in applications that involve data sharing or data transformation and plays a central role in data exchange and integration [19]. A schema in general contains data structures and their relationships which are needed to describe and to solve a given task. Schema mapping typically describes the relationship between two schemas and can be understood as a triple consisting of a source schema, a target schema and a set of relationships between the source and the target schema. To implement schema mapping, the detection of schema overlaps – also known as schema matching – 17 of the involved schemas is necessary. This can be done directly (Fig. 3 – left) or indirectly via a 18 standard schema (Fig. 3 – middle, right). It should be noted that the quantity of data to be exchanged between two software applications cannot be increased by a standard. Fig. 3 shows – for different data schema configurations –

627

the resulting set of exchangeable information, where A and B are the data schemas of the software applications to be coupled and S is the data schema which is defined by a standard. An additional loss of information may occur (Fig. 3 – middle) if ðA \ BÞS–0. If ðA \ BÞS ¼ 0 (Fig. 3 – right), the exchangeable information is the same as shown in Fig. 3 (left). Schema matching is intensively investigated in many database application domains. A survey of approaches to automatic schema matching is given by Rahm and Bernstein [20] and the comparison of schema matching evaluations is introduced by Do et al. [21]. Schema mapping can be described by well-known mapping patterns [22–25]. Mapping patterns describe how data structures of the source schema are related to the corresponding data structures of the target schema. Approaches to mapping languages in engineering domains were developed [26–28]. Furthermore, schema mapping and mapping problems have been intensively examined – especially in the field of database management and design [29– 31]. Overviews about advances in this field are given by Kolaitis [19] and Lenzerini [32]. However, Bakis [33] noted that the capabilities of model mapping languages are limited as they only support the structural translation of the semantically identical information. He further noted that in order to reconcile any semantic differences, a computer interpretable description of the semantics and logic for converting between semantics is required and has to be explicitly specified. The implementation of a semantically correct mapping of overlapping schemas used in architecture, engineering and construction – such as IAI/IFC and STEP – is quite challenging. A consequence of the complexity of engineering tasks, the involved schemas have hundreds of data structures with thousands of attributes and relationships. Amor [34] found that without some definition of a mapping, it is impossible to guarantee the correctness of any implemented translator; also the author illustrated and discussed the development of a suite of mapping support tools to ensure semantically correct mappings. Moreover, he concluded that it is recommended to develop a range of certified mappings. A

Fig. 2. Schema mapping via a standard schema.

Fig. 3. Exchangeable information between two schemas: direct (left) and indirect via standard schema – common case (middle) and best case (right).

628

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

Q :¼ fqjq is a schemag

semantic mapping between CAD and IFC property definitions was investigated by Yang and Zhang [35]. However, Shen et al. [36] noted that one of the major problems on systems interoperability in the construction industry is the difficulty to access accurate data, information, and knowledge in a timely manner in every phase of the construction project lifecycle. An evaluated schema mapping would enable an a priori assessment of semantic interoperability in a timely manner. In contrast to the file comparison approach, the assessment of the data transfer does not operate on instances (data); it operates directly on the schemas, which represent the instances. A generic formalism of an evaluated schema mapping, which takes into account the various mapping patterns, is introduced in Section 4 and 5.

ð1Þ

C :¼ fða; bÞ 2 Q  Qjschema a is to be coupled with schema bg ð2Þ Data structure coupling can be implemented in two different ways. The point-to-point concept couples two schemas a, b e Q of two different applications directly. The respective schema coupling is (a, b) e C. A schema consists of data structures and their relationships. The exchangeable information between two data schemas a and b can be described by the intersection a \ b. A coupling of m applications in this way leads to a maximum number n of schema couplings:

n ¼ m  ðm  1Þ

3. Scenario example

ð3Þ

The standardized implementation instead couples two schemas a, b e Q of two different applications indirectly via an additional schema s e Q, usually a standard schema. This is reflected in the schema couplings (a, s) e C and (s, b) e C. The exchangeable information between two software applications via an additional standard is described by the intersection a \ s \ b. The interrelation between the number of schema couplings n and the number of coupled applications m is now linear:

A synthetic scenario from civil engineering (Fig. 4) is used to illustrate the formalization (Section 4) and the assessment (Section 5) of data structure coupling via an evaluated schema mapping. However, since the approach presented in this paper is generic, its application is not limited to the civil engineering domain, but can also be applied to other engineering domains, for example mechanical or industrial engineering.

n¼2m

3.1. Coupling scenario

In data structure coupling, a loss of data or meaning in the exchange of product data can hardly be avoided. Errors are caused by differences in the data schemas, e.g. between a standard schema and the internal data schemas of software applications. If schemas differ, data conversions are needed. A loss of data occurs, when one schema does not support data structures which exist in another one. A further loss of data, meaning or accuracy may occur, when one schema supports conceptually similar data structures but with different representations. In this case data conversions between the different representations are needed. A misinterpretation of data and information between different software systems can lead to numerous design errors within the planning process. For this reason, adequate assessment strategies have to be developed. These can then be used by designers and engineers as a tool to estimate the reliability of the data exchange. The assessment strategy presented in this paper is based on the evaluation of the attributes resulting from the schema mapping.

In an earlier planning phase, architect A makes a preliminary design of the building. Engineer B must then carry out complex structural analysis. Thus, the existing planning data have to be exchanged unidirectionally. Due to the complex structural analysis, engineer B has to collaborate with an additional engineer C. In this case, the data exchange is bidirectional. In consequence of the high number of software applications in the market, different applica´, tions and schemas are involved in the planning process, where a ´ and ´c are the data schemas used by the software applications b of engineers A, B and C. Engineers B and C finally agree on an exchange of planning data through a standard data schema s. 4. Formalization 4.1. Data structure coupling Two software applications are called data structure coupled if they exchange data via a common schema. However, during the planning process a lot of software applications with different schemas are to be coupled. In this context, set Q contains all the schemas (Eq. (1)) and set C all the schema couplings (Eq. (2)), and both of them are needed to solve the task.

A

4.2. Schema mapping A schema in general consists of data structures and their relationships to describe data. One widespread modeling paradigm in computer science is the object-oriented paradigm. In this

B

unidirectional

point-to-point



ð4Þ

C

bidirectional

standardised



s

Fig. 4. A synthetic coupling scenario from civil engineering.



629

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

context, classes are related to data structures and objects are related to particular instances of classes – which contain the data for a certain state. A coupling of two schemas can be achieved through the mapping of their data structures. Templates for the mapping are shown in Fig. 5. The deletion pattern is used if there is no equivalent data structure in the target schema. The copy pattern is used if a single data structure in the source schema is related to a single data structure in the target schema, whereas the splitting pattern is used if there is more than one data structure in the target schema. The combining pattern is used instead, if more than one data structure in the source schema is related to a single data structure in the target schema. Finally, the multiple-mapping pattern is a combination of the previous patterns. A schema S is a set of data structures, which are necessary to describe and to solve the given task:

S :¼ feje is a data structureg 2 Q

PðSÞ : power set of schema S 2 Q

ð6Þ

The power set P(S) of any set S is the set of all subsets of S, including the empty set and S itself. The power set of S contains |P(S)| = 2n elements, at which n is the number of elements in S. Finally, the schema mapping can be achieved by an unevaluated mapping function m, which maps elements a e P(A) of schema A e Q to corresponding elements b e P(B) of schema B e Q.

mAB : PðAÞ ! PðBÞ with A; B 2 Q

ð7Þ

As an example, the mapping of two schemas is illustrated in Fig. 6. The different templates are: copy (1), splitting (2), combining (3), deletion (4) and multiple mapping (5). 4.3. Application to the scenario example The scenario example from Section 3 contains four different schemas and five schema couplings, i.e. five schema mappings have to be made to solve the given task: 0

0

c3:myPoint3D

l, w, h: double

x,y,z : double

c2:myMaterial e,k: double 0

Fig. 7a. Data structures of schema c .

s1:Wall2D

s2:Point2D

l, w: double e, k: double

x,y: double

ð5Þ

Data structures inside of a schema S e Q can interrelate to describe more complex facts. These relationships can be formulated with the power set P(S):

0

c1:myWall3D

0

Q ¼ a0 ; b ; c0 ; s; C ¼ ða0 ; b Þ; ðb ; sÞ; ðs; b Þ; ðs; c0 Þ; ðc0 ; sÞ

Fig. 7b. Data structures of schema s.

The schema mapping is shown exemplary for schema pair 0 0 (c , s) e C. The schemas c and s contain the following data structures:

8 9 > < c1 : myWall3D; > = c0 ¼ c2 : myMaterial; 2 Q > > : ; c3 : myPoint3D





s1 : Wall2D; s2 : Point2D

copy

splitting

2Q

The data structures (Figs. 7a and 7b) are defined by the following variables: l, w and h, which describe the wall dimensions – length, width and height; x, y and z are Cartesian coordinates; e and k are material properties (Young’s modulus, heat conductivity coefficient). 0 As an example, the mapping from schema c to s is shown in Fig. 8a, and the corresponding mapping patterns are shown in Fig. 8b. A wall c1: myWall3D associated with a position point c3: myPoint3D and a material property c2: myMaterial is represented 0 by {c1, c2, c3} e P(c ). The corresponding representation in schema s is {s1, s2} e P(s) as the data structure s1:Wall2D already includes the material properties. Both representations are coupled by the multiple-mapping pattern.

∅ deletion



combining

Fig. 5. Mapping patterns according to Katranuschkov [24].

Fig. 6. Example schema mapping of schema A with schema B.

multiple-mapping

630

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

5.1. Sources of mapping errors

0

Fig. 8a. Schema mapping from c to s.

Various error sources exist and reduce the quality of the mapping under certain conditions. Widespread error classes during the abstraction of information into data are numerical errors and errors of approximation. A subclass of a numerical error describes the rounding error that occurs when an integer value is mapped to a floating-point value or when a single precision value is mapped to a double precision value. Another error subclass occurs when mathematically exact formulations are represented by a Computer. Table 1, left shows for example, that the geometry of a wall can be described in different ways. The conversion of such mathematically exact representations usually leads to numerical errors. A subclass of an error of approximation is the conversion of conceptually more or less similar data structures that have a different representation. For example, non-planar surfaces can be approximated by a mesh of planar surfaces or by a single free-form surface (Table 1, middle). The approximation error depends on how good the mesh approximates the desired surface. Another, non-geometrical example is the conversion of different color representations (Table 1, right). A color can be represented by RGB values, by CMY/CMYK values or by a web-safe color palette. The conversion of RGB values to the web-safe color palette can lead to an approximation error. 5.2. Example – Approximation errors in modeling columns

Fig. 8b. Corresponding mapping patterns.

To date, the mapping of one schema to another schema is not evaluated. A method for an a priori assessment of the data exchange is described in the next section. 5. Assessment Generally, software applications are developed for different domains (CAD, FEM, Facility Management etc.) to solve specific tasks, or they are optimized for similar purposes, for example architectural design. The information processing (Fig. 9) with a computer requires an appropriate abstraction of information. The information is the reality and the input from e.g. the engineer to describe/model real-life buildings, whereas the data is an abstract representation of the real-life information within a software application normally via data structures. The transformation of data back into information is achieved by the interpretation of data. A consequence of the abstraction of information and the interpretation of data is that a perfect semantic interoperability between different internal schemas cannot be expected. Particularly, the abstraction of information into data can be influenced by many sources of errors.

In this section, the error of exchanging different types of columns from a source system A to a target system B shall be estimated (Fig. 10). The source system both has a facet modeler and a free-form modeler to describe the different types of columns (e.g. prism, cylinder) as accurately as possible. The modeler of system B can only handle faceted solids. The error can be estimated through a consideration of differences in the volumes of the exchanged columns. The difference dV between volume V1 of column A (source system) and volume V2 of column B (target system) can be computed as follows:

dV ¼ jV 1  V 2 j Then, an absolute error e can by computed as the ratio of dV toV1:



dV V1

Finally, the quality Q is described as:

Q ¼1e In the case of an exchange of rectangular columns there is no difference in the volumes dV = 0 (Fig. 10a). This leads to e = 0 and to Q = 1 as both modelers can handle faceted solids without a change in the representation of the column. In the case of an exchange of cylindrical columns there is a difference in the volumes dV – 0 (Fig. 10b). The reason being that the modeler of the target system cannot handle free-form solids; instead it has to approximate the cylinder as n-prism (Fig. 10b). The difference in the volume dV and the absolute error e depends on the base n of the prism:

     r2  n 360   n 360 ; dV ¼ p  r 2  h   sin  h ¼ p   sin 2 n 2 n  n    sin 360  n  e ¼ 1  2 

p

Fig. 9. Information processing with a computer.

As an example, the approximation of cylindrical columns in system A through n-prism columns of system B – with a base equal to or greater than 20 – would lead to e = 0.016 and to Q = 0.984.

631

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639 Table 1 Examples of geometrical and non geometrical errors.

Numerical error

Error of approximation (geometrically)

Error of approximation (non-geometrically)

IA :¼ fiji is an instance of a data structure e 2 A 2 Qg

ð10Þ

The relationships of instances i e IA follow exactly the relationships defined in the underlying schema A e Q. Consequently, the instance relationships can be formulated by the power set P(IA). The relationship between instances and schema therefore is:

type : PðIA Þ ! PðAÞ

ð11Þ

Finally, the quality of the mapping for a set of instances IA of schema  A;B , in which A e Q to another schema B e Q can be computed via m ntype(i) is the number of instances of a specific type i e IA.

 A;B ; IA Þ :¼ Quality :¼ f ðm Fig. 10a. Rectangular column.

X

X i2PðIA Þ

j2PðBÞ

 A;B ðtypeðiÞ; jÞ m ntypeðiÞ

ð12Þ

 A;B provides the quality value for The evaluated mapping function m the mapping of an instance i, created on the basis of a specific data type type(i) of source system A, to the corresponding schema element j e P(B) of target schema B. For example, regarding Table 2, type(i) refers to one specific row (e.g. {c1}), whereas type(j) refers to the corresponding column (e.g. {s1}) resulting in the quality value of 0.7.The quality value for an evaluated mapping ða; bÞ 2 mA;B can be computed only on the attributes of the involved data structures. Data structures themselves can be treated as schema for instances and attributes as data structures. This allows using the same assessment formalization process again. 5.4. Application to the scenario example

Fig. 10b. Circular column.

5.3. Evaluated schema mapping A set of quality values R is used for an assessment of the data exchange. A quality value r e R describes how good the mapping of two corresponding data structures is. The domain can be defined arbitrarily, for example, linguistically or numerically. For now: R is the subset of real numbers R in the interval [0, 1]. A value of 1 means no loss of information during the exchange, whereas a value less than 1 means that only a part of the information is exchanged.

R :¼ fr 2 Rj0 < r 6 1g

ð8Þ

The qualitative assessment of a data exchange between two schemas A, B e Q is achieved by an evaluated schema mapping  A;B . Each mapping (a, b) e mA,B is associated with a quality value m r e R:

 A;B : mA;B ! R m

ð9Þ

Thus far, the quality of data exchange is defined on schema level, but the quality of data exchange for a specific set of instances has to be computed on instance level. As a prerequisite, the data instances must be known. The instances to be exchanged are in the instance set IA:

For the scenario example, the schema mapping mc0 ; s (Fig. 8a) is  c0 ; s . Therefore, each extended to an evaluated schema mapping m mapping has to be evaluated. The attribute mapping from data structure c1 = {l, w, h} to s1 = {l, w, e, k} (Fig. 11a) and the evaluated mapping (Fig. 11b) is taken as an example. Conversions between different representations e.g. different coordinate systems or different geometrical descriptions of a wall are not necessary. The attributes l, w of data structure c1:myWall3D can be mapped to the corresponding attributes l, w of data structure s1:Wall2D directly, without any loss of information. The attribute h of data structure c1:myWall3D can only be mapped to£, which means an information loss will occur. The overall quality of mapping data structure c1 : myWall3D to s1 : Wall2D is:

   mðflg; flgÞ þ mðfwg; fwgÞ þ mðfhg; f;gÞ 1:0 þ 1:0 þ 0:0 ¼ 1þ1þ1 3  0:7

Q c1 ;s1 ¼

 c0 ; s and m  s; c0 for an The complete schema mapping functions m assessment of a data exchange from c0 2 Q to s 2 Q are shown in Table 2.

c0 ¼ fc1 ; c2 ; c3 g 2 Q ; fS1 ; S2 g 2 Q For example, the mapping between {c1:myWall3D} and {s1:Wall2D} has a quality value of 0.7 because the height cannot be exchanged. The mapping between {c1:myWall3D, c2:myMateri-

632

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

Table 2 The needed schema mapping functions as matrices.

The overall quality of a data transfer of both instance sets can be computed as follows:     4  mðfc 1 g; fs1 gÞ þ 4  mðfc 3 g; fs2 gÞ þ 2  mðfc 2 g; ;Þ þ 4  mðfc 1 ; c 2 ; c3 g; fs1 ; s2 gÞ  1 4þ4þ2þ4 ¼ 0:686

Q c0 ;s ¼

Q s;c0 ¼

 1 g; fc1 ; c2 gÞ þ 4  mðfs  2 g; fc3 gÞ þ 4  mðfs  1 ; s2 g; fc1 ; c2 ; c3 gÞ 4  mðfs ¼1 4þ4þ4

A reconsideration of the scenario example shows that the assessment works well for these specific simple scenario conditions. Equivalent representations (e.g. for a wall) and data types with the same precision (double values) are used for this example. The only differences are the reduced geometrical dimension and the variety in modelling of classes and their associations. 6. Case study Fig. 11a. Attribute mapping,

Fig. 11b. Evaluated attribute mapping.

al} and {s1:Wall2D} has a quality of 1.0 because the material property is included in s1:Wall2D. On the other hand, the exchange from the 2D schema s to the 3D schema ´c takes place without any loss of information. It should be noted that the quality values are to be defined according to practical applications. This is exemplified for specific instance sets (Fig. 12) in which wi are instances of walls, pi are instances of points and mi are instances of materials:

The proposed approach to assess the quality of data exchange based on schema mappings has been developed in the context of the Research Training Group 1462 – Evaluation of Coupled Numerical Partial Models in Structural Engineering. The overall goal of this research group is to build up a methodical basis that can assume the quality of prognosis models in structural engineering in a quantitative manner. Due to the complexity of the task, the research group has been divided into twelve subprojects. Each sub-project investigates subproblems with the aid of different software applications. However, to find out model divergence it is necessary to examine the complete task consisting of all its sub-tasks. This makes an exchange and sharing of data and information between the sub-projects and consequently between the used software applications essential. Hence, an error-free data exchange between the used software applications has become a prerequisite to build up such a methodical basis. Investigations and results have been carried out and verified on three reference objects: a tower structure, a bridge construction, and a multi-storeyed frame structure. The proposed approach has been applied to the multi-storeyed frame structure in order to meet the requirements for assessing data exchanges, e.g. in the scope of linear calculations regarding adaptive structural design considering soil-structure interaction. 6.1. Multi-storeyed frame structure The type of multi-storeyed frame structure in general is suitable to examine the essential problems of interaction between structure

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

and foundation, earthquake influences, bracing systems and the effects from spatially distributed influences. Therefore it has been adopted within various sub-projects of the research group. Because of the different tasks, the frame structure has to be modelled in various levels of detail concerning geometry, material, masonry infill, kinematics, soil-structure interaction, etc. The elements of the frame structure have been classified into primary and secondary elements.  Primary elements are basic elements such as beams and columns. They are an inherent part in modelling the frame structure.  Secondary elements are additional elements such as braces and panels (masonry infill), foundation and soil (external elements), fixed supports and linear springs (soil-structure interaction), bearing, boundary conditions and loads (mechanics), or the different types of steel, concrete and soil (material). They are an optional part in modelling the frame structure and their use depends on the task which has to be examined. Fig. 13 shows the geometrical properties and different modelling aspects for masonry infill. The process of schema analysis, schema mapping and evaluated schema mapping is shown for modelling columns, which are primary elements of the frame structure.

633

6.2. Coupling scenario The frame structure introduced in Section 6.1 has been adopted within the research group in order to investigate various sub-problems with the aid of different software applications.  ANSYS, a FEM-based engineering simulation software, is used within the scope of the development of a multi-criteria evaluation method for the prognosis quality of complex engineering models. Furthermore, it has been used for static linear calculations in the context of adaptive structural design considering soil-structure interaction.  SAP2000, another FEM-based engineering simulation software, has been used for dynamic non-linear calculations in the context of adaptive structural design.  Further software applications of different domains such as CADEMIA (design), SLANG (stochastic modelling), PLAXIS (geotechnical analysis) or SYSWELD (welding simulation), which already have been used within the research group, may also be linked to the scenario. Due to the number of different software applications involved, a centralized approach is proposed in order to enable collaboration and systems integration. In addition, the Industry Foundation Classes, which have been developed to describe building and construc-

Fig. 12. Instance sets containing walls, points and materials.

Fig. 13. The multi-storeyed frame structure with braces (left) and panels (right).

634

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

Fig. 14. Coupling scenario regarding the multi-storeyed frame structure.

tion industry data, are chosen as a central building data model. This leads to the coupling scenario shown in Fig. 14. Schema analysis and schema mapping within each schema coupling are important issues in order to assess the quality of data exchange and to meet the requirements to enable collaboration and systems integration. The proposed coupling scenario (Fig. 14) consists of seven participating schemas:

Q ¼ fIFC; SAnsys ; SSAP ; SSLANG ; SCADEMIA ; SSYSWELD ; SPLAXIS g; and twelve schema couplings, i.e. twelve schema analysis and mappings have to be made:  C ¼ ðIFC; SAnsys Þ; ðSAnsys ; IFCÞ; ðIFC; SSAP Þ; ðSSAP ; IFCÞ; ðIFC; SSLANG Þ; ðSSLANG ; IFCÞ; ðIFC; SCADEMIA Þ; ðSCADEMIA ; IFCÞ; ðIFC; SSYSWELD Þ; ðSSYSWELD ; IFCÞ; ðIFC; SPLAXIS Þ; ðSPLAXIS ; IFCÞg

The proposed approach to assess the quality of data exchange based on schema mappings will be applied to schema coupling (IFC, SAnsys) e C, which is needed within the research group, e.g. in the scope of linear calculations for adaptive structural design considering soil-structure interaction. 6.3. Schema analysis Data structures inside of a schema S e Q can be interrelated to describe more complex facts. These relationships are included in the power set P(S). From a mathematical point of view, the power set contains all the possible relations corresponding to the single data structures of a schema. However, from a software-engineering point of view, the data structures of a schema are only in relation-

ship with a subset of data structures. In this context, the schema analysis is used to figure out all the logical connections in P(S) which, from a software-engineering point of view, are not included in the schema S. Those relations will never be activated during the mapping process and can be removed from the original power set. Hence, a schema analysis has to take into account the various data types (e.g. abstract data types), on the one hand, and the different types of logical connections (e.g. general relations, instance/classlevel relations and optional/required relations), on the other hand. The examined schema coupling(IFC, SAnsys) e C consists of two different schemas, the IFC schema and the ANSYS schema. Certain automated stepwise schema analyses regarding the IFC schema have been carried out on subsets of entities, which are needed for modelling the various primary and secondary elements of the frame structure. For example, in order to model columns, the chosen subset SCol  IFC comprises 56 of more or less interrelated entities. The power set of SCol leads mathematically toP(SCol) = 256 combinations which are more or less valid by the IFC schema definition. However, the number of combinations can be drastically reduced via a stepwise schema analysis. The performed schema analysis includes ten steps (Fig. 15), which can be classified into three main categories: schema-rule steps, global-modelling steps and localmodelling steps.  Schema-rule steps (Steps 1–6) have been based on the schema itself (i.e. data types, logical connections) and can be applied to every other subset out of the schema.  Global-modelling steps (Steps 7–8c) have been based on assumptions and restrictions which were made in a more global context. They may be applied as well to other subsets.  Local-modelling steps (Steps 9–10) have been based on assumptions and restrictions which were made in a strong relation to the used subset and to the task which had to be solved. They are rarely applicable to other subsets and tasks. Finally, the power setP(SCol) has been reduced to six valid combinations regarding to the made restrictions. Each combination includes the minimal set of needed IFC entities for modelling a certain type of column. Fig. 16 shows exemplarily the minimal set of entities for modelling I-shaped and rectangular columns. 6.4. Schema mapping Schema mapping is one of the key issues within the proposed approach. The mapping between the IFC schema and the ANSYS schema (IFC, SAnsys) e C is defined by the mapping function:

Fig. 15. The influence of the schema analysis on P(SCol).

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

635

Fig. 16. The minimal relationship tree (left) I-shaped and (right) rectangular columns.

mIFC;SANSYS : PðIFCÞ ! PðSANSYS Þ with IFC; SANSYS 2 Q It will be achieved through a mapping between entity data types, which are defined by the IFC and APDL macros/commands, which are part of the ANSYS schema. Within the case study, only a subsetSCol # IFC for modelling certain types of columns has been examined. The related mapping function is:

mSCol ;SANSYS : PðSCol # IFCÞ ! PðSANSYS Þ with IFC; SANSYS 2 Q The mapping of an I-shaped columne P(SCol) is shown in Fig. 17. Due to the fact that both can be interrelated within her own environment the mapping process is more or less a top-down process. o The placement of a column is achieved by a local coordinate system, which is defined by a point and two direction vectors. Within ANSYS this fact can be modelled by a local coordinate system, which is defined by three points. Fortunately, both representations can be mathematically converted into each other. o The geometrical representation of a column is modelled as an extruded area solid which is defined by a parameterised section and an extrusion direction. Within ANSYS this fact can be modelled by a polygonal section and an extrusion direction, which leads to a geometrical approximation error. Finally, the I-shaped column is defined by ten interrelated entities and can be mapped to an equivalent ANSYS macro consisting

of 18 nested command calls. However, during the mapping various conversions have to be made, which may lead to errors in the data exchange. 6.5. Evaluated schema mapping The range of quality values has been chosen as R :¼ fr 2 Rj0 < r 6 1g. A value of 1 means no loss of information during the exchange, whereas a value less than 1 means that only a part of information can be exchanged. Finally, the qualitative assessment of the data exchange for schema mapping mSCol ;SANSYS (Section 6.4) is achieved by an evaluated schema mapping:

 SCol ;SANSYS : mSCol ;SANSYS ! R m An adequate quality value r for each mapping in mSCol ;SANSYS can be determined only on the basis of attributes. An evaluated mapping of an I-shaped column e P(SCol) is shown in Fig. 18. The information on column placements can be mapped without errors by two conversions, whereas the information on section representations cannot be mapped without errors due to different graphical representations (Fig. 19). However, the geometrical approximation error can be estimated, as it depends only on the fillet radius. In Fig. 20 the error for various I-profiles is shown. The maximum error is about five percent and has been used as worst case scenario r = 0.95. The

Fig. 17. The mapping from an IFC I-shaped column to an APDL macro.

636

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

Fig. 18. The attribute mapping from an IFC I-shaped column to an APDL macro.

Fig. 19. Columns with fillet radius (left – IFC) and without (right – ANSYS).

average of all approximation errors r = 0.9631 can also be used as quality value. The final quality for mapping IFC columns to ANSYS macros can be computed by averaging the partial qualities. The averaging has to be computed for all the partial qualities of a branch, which then is the basis for the next higher branch (parent). (Fig. 21) The final quality value for mapping an IFC I-shaped column to a self-created ANSYS macro is the average of the child branches IfcLocalPlacement (r = 1.0) and IfcProductDefinitionShape (r = 0.975) and results in r = 0.9875. The whole evaluation process has been implemented in the same ways for the various types of columns which are examined in SCol e IFC and leads to the needed evaluated schema mapping  SCol ;SANSYS (Table 3). function m

6.6. Quality assessment The quality of data exchange for the certain types of columns (Table 3) can be computed for arbitrary sets of column instances ISCol in an a priori manner. The computation is based on the evalu SCol ;SANSYS : ated mapping function m

 SCol ;SANSYS ; ISCol Þ :¼ Quality :¼ f ðm

X i2PðIS

Col

X Þ j2PðSANSYS Þ

 SCol ;SANSYS ðtypeðiÞ; jÞ m ntypeðiÞ

As an example, the quality for a set of different column instances (Fig. 22) has to be computed. The frame structure in Fig. 22 has been modelled by 22 I-shaped columns and 10 rectangular columns. The overall quality for a data exchange can be computed as follows:

637

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

Q c0 ;s ¼

  22  mðfIfcColumn Ishaped g; fMacroIshaped gÞ þ 10  mðfIfcColumnRect g; fMacroRect gÞ 22 þ 10

Fig. 20. The geometrical approximation error of IPE-profiles.

Fig. 21. The final mapping quality from an IFC I-shaped column.

Table 3  SCol ;SANSYS as matrix. Schema mapping function m  SCol ;ANSYS m

MacroRect

MacroCirc

MacroT-shaped

MacroL-shaped

MacroU-shaped

MacroI-shaped

£

IfcColumnRect IfcColumnCirc IfcColumnI-shaped IfcColumnL-shaped IfcColumnU-shaped IfcColumnT-shaped £

1.0 0 0 0 0 0 0

0 1.0 0 0 0 0 0

0 0 0 0 0 0.99 0

0 0 0 0.98 0 0 0

0 0 0 0 0.945 0 0

0 0 0.9875 0 0 0 0

0 0 0 0 0 0 1

638

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639

Fig. 22. Three dimensional view on primary elements of the framer structure.

Q c0 ;s ¼¼

22  0:9875 þ 10  1:0 ¼ 0:9914 22 þ 10

7. Conclusions A successful and error free exchange of digital data is one of the most important factors to enable collaborative and well distributed working across the borders of disciplines and organizations. The essential task, namely the mapping of schemas has been an active subject of research during the last decade. However, the capabilities of model mapping languages are limited as they only support the structural translation of the same information. Furthermore, one of the major problems on systems interoperability in the construction industry is the difficulty in accessing accurate data, information, and knowledge in a timely manner in every phase of the construction project lifecycle. This paper provides a generic a priori approach to assess coupling quality for data structure-based coupling. This approach operates on the involved schemas and takes into account various mapping patterns. The coupling quality is evaluated within the formalization process by taking into account different sources of mapping errors. Finally, the assessment process results in global quality values and indicators, which can be used directly by the user to assess the data exchange. 8. Future directions and challenges An error-free exchange of data is one prerequisite for collaboration and integration. The main focus in the last decades has been on the development of PDT standards in order to achieve this goal. However, an exchange of data without errors and information losses by means of standardized data models is still not possible. Hence, data assessment approaches become more and more important for detecting inconsistencies and to increase the reliability of transferred data. A challenge is the combination of the existing approaches for a priori and a posteriori data assessment into a centralized multistage analysis and assessment platform. Such an environment must be able to cope with the dynamic and distributed nature of the planning process. The combination of online software integration technologies, such as web-based, agent-based, or objectdistributed systems as well as centralized databases, will be a vital component of such a system. Another challenge will be the development and improvement of new or existing quality metrics in order to ensure consistency of data and to build confidence of existing software and tools. An independent but important problem field is the area of data management mechanisms, such as transaction management, access control, version management, and change notification/propagation mechanisms. These challenges warrant further research.

Acknowledgement This research is supported by the German Research Foundation (DFG) through Research Training Group 1462, which is greatly appreciated by the authors.

References [1] W.B. Teeuw, J.R. Liefting, R.H.J. Demkes, M.A.W. Houtsma, Experience with product data interchange: On product models, integration, and standardization, Computers in Industry 31 (1996) 205–221. [2] R. Amor, Preservation of Meaning in Mapped IFCs, in: Proc. of the 4th European Conference on Product and Process Modelling (ECPPM), 2006, pp. 233–236. [3] W. Gielingh, An assessment of the current state of product data technologies, Computer-Aided Design 40 (7) (2008) 750–759. [4] J. Banerjee, W. Kim, H. Kim, H. Korth, Semantics and Implementation of Schema Evolution in Object-Oriented Databases, in: Proc. of the 1987 ACM SIGMOD International Conference on Management of Data, 1987, pp. 311–322. [5] B.S. Lerner, A.N. Habermann, Beyond schema evolution to database reorganization, in: Proc. of Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 1990, pp. 67–76 [6] C.M. Eastman, A data model analysis of modularity and extensibility in building databases, Building and Environment 27 (2) (1992) 135–148. [7] R. Zicari, A Framework for schema updates in an object-oriented database systems, in: Data Management Systems, Morgan Kaufmann Series, 1992, pp. 146–182. [8] M.P. Atkinson, M. Dmitriev, C. Hamilton, T. Printezis, Scalable and Recoverable Implementation of Object Evolution for the PJama Platform, in: Proc. of the 9th International Workshop on Persistent Object Systems (POS-9), 2000, pp. 292– 314. [9] J.S.M Vergeest, I. Horváth, Where interoperability ends, in: Proc. of DETC2001 ASME Design Engineering Technical Conference, 2001. [10] S. Backas, Short description of IFC Project, Final Technical Report, 2001. http:// cic.vtt.fi/vera/Documents/SPADEX_final_report.pdf. [11] M. Fischer, K. Cam, PM4D Final Report CIFE Technical report 143, Stanford University, 2002, http://www.stanford.edu/group/4D/download/c1.html. [12] A. Geiger, Product Models in Civil Engineering – IFC in Practice, Research Center Karlsruhe, Institute of Applied Computer Science, in german (Produktmodelle im Bauwesen – IFC im Praxistest), 2001, http://www.iai. fzk.de/www-extern-kit/fileadmin/download/download-vrsys/Praktikumsar beit.pdf. [13] V, Bazjanac, Early Lessons From Deployment of the IFC Compatible Software, in: Proc. of the European Conference on Product and Process Modelling in the Building and Related Industries (ECPPM), 2002, pp. 9–16. [14] IAI Forum Denmark, IFC Exchange Test between 3D CAD applications, Technical Report, 2006, http://ww.cic.vtt.fi/projects/vbe-net/data/ Danish_IFC_Exchange_Test_April_2006.pdf. [15] H. Ma, E. Ha, J. Chung, R. Amor, Testing Semantic Interoperability, in: Proc. of the Joint International Conference on Computing and Decision Making in Civil and Building Engineering (ICCCBE), 2006, pp. 1216–1225. [16] T. Pazlar, Z. Turk, Analysis of the Geometric Data Exchange Using the IFC, in: Proc. of the European Conference on Product and Process Modelling, 2006, pp. 165–172. [17] T. Pazlar, Z. Turk, Interoperability in practice. Geometric data exchange using the IFC standard, Information Technology in Construction 13 (2008) 362–380. [18] Y.-S. Jeong, C.M. Eastman, R. Sacks, I. Kaner, Benchmark tests for BIM data exchanges of precast concrete, Automation in Construction 18 (4) (2009) 469– 484. [19] P.G. Kolaitis, Schema Mappings, Data Exchange, and Metadata Management, in: Proc. of the ACM Symposium on Principles of Database Systems (PODS), 2005, pp. 61–75.

T. Fröbel et al. / Advanced Engineering Informatics 25 (2011) 625–639 [20] E. Rahm, P.A. Bernstein, A survey of approaches to automatic schema matching, The International Journal on Very Large Data Bases (VLDB) 10 (4) (2001) 334–350. [21] H.-H. Do, S. Melnik, E. Rahm, Comparison of Schema Matching Evaluations, in: Proc. of the 2nd International Workshop on Web Databases (German Informatics Society), 2002, pp. 221–237. [22] A. Bijnen, Operation Mapping Or How to Get the Right Data?, in: Proc. of the European Conference on Product and Process Modelling (ECCPM), 1995. [23] R. Amor, A Generalised Framework for the Design and Construction of Integrated Design Systems, Ph.D. Thesis, Department of Computer Science, University of Auckland, 1997. [24] P. Katranuschkov, A Mapping Language for Concurrent Engineering Processes, Ph.D. Thesis, Institute for Computing in Engineering, Technical University of Dresden, 2001. [25] J.C. Grundy, J.G. Hosking, R. Amor, W.B. Mugridge, Y. Li, Domain-specific visual languages for specifying and generating data mapping systems, Journal of Visual Languages and Computing 15 (3–4) (2004) 243–263. [26] M. Verhoef, T. Liebich, R. Amor, A Multi-Paradigm Mapping Method Survey, in: Proc. of the CIB W78 – TG10 Workshop on Modeling of Buildings through their Life-cycle, Stanford University, 1995, pp. 233–247. [27] T. Khedro, C.M. Eastman, R. Junge, T. Liebich, Translation Methods for Integrated Building Engineering, in Proc. of the ASCE Conference on Computing, 1996, pp. 579–585.

639

[28] C.M. Eastman, Building Product Models. Computer Environments Supporting Design and Construction, CRC Press, Boca Raton, 1999. [29] R. Fagin, P.G. Kolaitis, R.J. Miller, L. Popa, Data exchange: Semantics and query answering, Theoretical Computer Science 336 (1) (2005) 89–124. [30] J. Madhavan, A. Halevy, Composing mappings among data sources, in: Proc. of the 29th International Conference on Very Large Data Bases (VLDB), 2003. [31] A. Fuxman, M.A. Hernandez, H. Ho., R.J. Miller, P. Papotti, L. Popa, Nested Mappings: Schema Mapping Reloaded, in: Proc. of the 32th International Conference on Very Large Data Bases (VLDB), 2006, pp. 67–78. [32] M. Lenzerini, Data Integration: A Theoretical Perspective, in: Proc. of the ACM Symposium on Principles of Database Systems (PODS), 2002, pp. 233–246. [33] N. Bakis, G. Aouad, M. Kagioglou, Towards distributed product data sharing environments–Progress so far and future challenges, Automation in Construction 16 (2006) 495–586. [34] R. Amor, Supporting standard data model mappings, in: Proc. of European Conference on Product and Process Modelling in the Building and Related Industries (ECPPM), 2004, pp. 35–40. [35] Q.Z. Yang, Y. Zhang, Semantic interoperability in building design: Methods and tools, Computer-Aided Design 38 (10) (2006) 1099–1112. [36] W. Shen, Q. Hao, H. Mak, J. Neelamkavil, H. Xie, J. Dickinson, R. Thomas, A. Pardasani, H. Xue, Systems integration and collaboration in architecture, engineering, construction, and facilities management: A review, Advanced Engineering Informatics 24 (2009) 196–207.