Towards product-level parallel computing of large-scale building information modeling data using graph theory

Towards product-level parallel computing of large-scale building information modeling data using graph theory

Journal Pre-proof Towards Product-level Parallel Computing of Large-scale Building Information Modeling Data using Graph Theory Xiaoping Zhou, Jichao ...

2MB Sizes 0 Downloads 21 Views

Journal Pre-proof Towards Product-level Parallel Computing of Large-scale Building Information Modeling Data using Graph Theory Xiaoping Zhou, Jichao Zhao, Jia Wang, Ming Guo, Jiayin Liu, Honghong Shi PII:

S0360-1323(19)30770-X

DOI:

https://doi.org/10.1016/j.buildenv.2019.106558

Reference:

BAE 106558

To appear in:

Building and Environment

Received Date: 15 August 2019 Revised Date:

11 November 2019

Accepted Date: 19 November 2019

Please cite this article as: Zhou X, Zhao J, Wang J, Guo M, Liu J, Shi H, Towards Product-level Parallel Computing of Large-scale Building Information Modeling Data using Graph Theory, Building and Environment, https://doi.org/10.1016/j.buildenv.2019.106558. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Elsevier Ltd. All rights reserved.

Towards Product-level Parallel Computing of Large-scale Building Information Modeling Data using Graph Theory Xiaoping Zhou1*, Jichao Zhao2, Jia Wang3, Ming Guo4, Jiayin Liu5, Honghong Shi1 1

Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing University of Civil Engineering & Architecture, Beijing 100044, China. Email: [email protected] 2 CNPC Managers Training Institute, Beijing 100096, China 3 Beijing Engineering Research Center of Monitoring for Construction Safety, Beijing University of Civil Engineering and Architecture, Beijing 100044, China 4 Institute of Mapping and Urban Spatial Information, Beijing University of Civil Engineering & Architecture, Beijing 100044, China; 5 Ohio State University, Ohio 43210, USA

Abstract. Building Information Model/Modeling (BIM) upgrades the digitization of buildings from 2D to 3D and has become a common paradigm in architecture, engineering, construction, operation, and facility management (AECO/FM) industry. Many studies have argued that the volume of BIM keeps increasing and becomes extremely large as projects proceed. However, processing large-scale BIM data in parallel is still an open issue due to the complicated reference relationships in a BIM. This study addresses this issue by introducing graph theory. Firstly, a novel directed reference graph (DrGraph) model is proposed to capture the reference relationships among BIM instances according to the Industrial Foundation Classes (IFC) specification. On top of DrGraph, BIM product slice (ProSlice) model is developed to split the original IFC file into small independent IFC slices in product level. Since the generated slices are independent of each other, they can be dispatched and computed in a parallel computing cluster. The usage of ProSlice in MapReduce is illustrated using the BIM triangulation task. Finally, experimental results show that our proposed product-level parallel computing scheme has a better performance in both computation efficiency and internal memory usage than floor-level ones, and that parallel computing scales the computation of BIM data. Our product-level parallel computing scheme can also envision other geometric data processing and semantic data analysis of large-scale BIM data. Keywords: Building Information Model/Modeling (BIM), Industry Foundation Classes (IFC), Parallel Computing, Graph Theory, BIM Product.

1

Introduction Building Information Model/Modeling (BIM) has been widely accepted as an emerging and

promising technology for Building Lifecycle Management (BLM) [1], because it provides a shared knowledge resource for information about facilities of a building during its lifecycle [2]. Due to the capability of data interoperability and collaboration among stakeholders, BIM has become a common paradigm in architecture, engineering, construction, operations, and facility management (AECO/FM) industry [3]. Currently, the adoption of BIM can be found in almost all stages of BLM, including design collaboration [4], BIM-based n-dimensional (nD) management [5], and facility management [6], indoor localization[7], to name a few. Additionally, BIM also plays an increasingly important role in smart buildings [8] and smart cities [9]. Generally, a BIM model refers to as “the total sum of information” about all facilities of a building in digital form [10]. A BIM contains not only spatial/geometric data, but also a great amount of engineering data which is accumulated throughout the lifecycle of a building. As projects proceed, the volume of BIM keeps increasing and becomes extremely large. Hence, the ability of processing large-scale BIM data efficiently has become a crucial issue. Parallel computing [11] is a type of computation in which many calculations are executed simultaneously. In parallel computing, a computational task is typically broken down into several, often many, very similar sub-tasks that can be processed independently. Obviously, the adoption of parallel computing is a direct approach to improve the capability of processing large-scale BIM data. Computing BIM data in parallel is not straightforward. On one hand, current parallel computing frameworks are known to process independent data slices. On the other hand, BIM data are organized in the entity-relationship model, where BIM instances are highly dependent. To the best of the authors’ knowledge, the augmented version of MapReduce [12], called AMR, is the only parallel algorithm to compute BIM data at the floor-level. The fact that not all BIM instances are related to floors limits the application of AMR. Additionally, segmenting BIM data into smaller data slices can further improve computation efficiency, especially in the scenarios where a floor contains a great many BIM products. Processing large-scale BIM data in parallel is still an open issue due to the complicated reference relationships in a BIM. This study aims to address this issue by proposing a novel product-level parallel computing scheme using graph theory [13]. The Industry Foundation Classes data model (IFC) [14] is the internationally accepted specification for BIM. Without loss of generality, it is assumed in this study that the BIM data are expressed by an IFC file. The contributions of this study are summarized below: (1) Present a Directed Reference Graph (DrGraph) to model the relationships among BIM instances. DrGraph captures the dependent relationships in BIM and enables the segmentation of a BIM file using graph theory. (2) Propose a novel BIM Product BIM Slice (ProSlice) model on top of DrGraph. ProSlice splits the original BIM data into independent product-level BIM slices, which can be re-organized, dispatched and computed separately in a parallel computing cluster.

(3) Illustrate the usage of ProSlice in MapReduce and evaluate the effectiveness of our product-level parallel computing scheme using the BIM triangulation task. The experimental results show that our scheme outperforms the existing methods and scales the computation of BIM data. The remainder of the paper is organized as follows. Section 2 discusses related works. Section 3 provides the necessary definitions and preliminaries, and Section 4 presents the product-level parallel computing scheme for BIM data. Section 5 describes the empirical studies, and the last section presents conclusions.

2

Related Work

2.1

Parallel Computing for BIM Data

Although many studies have argued [12, 15, 16, 17, 18] that BIM data would become extremely voluminous and was a typical integration of construction Big Data, few studies investigated parallel computing for BIM. Jiao et al. [12] proposed an augmented version of MapReduce, termed AMR, to enable parallel computing for large-scale BIM data. AMR firstly divides the original IFC file into small IFC files by grouping IFC instances according to the floor. Specifically, the original IFC file is segmented by the IfcBuildingStorey instances. Then, the split IFC slices are dispatched and computed in a MapReduce framework separately. To the best of our knowledge, AMR is the only effort in this area, which can be considered as a floor-level parallel computing solution for BIM data. However, AMR may not be applicable in many scenarios for the following reasons. (1) Some BIM products are missing in AMR. Examples include IfcProject, IfcSite, and IfcBuilding. Since instances of IfcProject, IfcSite, and IfcBuilding are not contained in any IfcBuildingStorey instances, they will not be computed using AMR. (2) The BIM products which are not aggregated into IfcBuildingStorey instances are ignored. IfcBuildingStorey is not a necessary element in an IFC file. Meanwhile, some BIM products are not associated with an IfcBuildingStorey instance. As a result, these products are not calculated using AMR. (3) AMR suffers a lower efficiency when there are too many BIM products aggregating on a floor. The running time usually increases with the volume of data slices. Obviously, the data slice with an extremely large volume can be further segmented into smaller data slices to improve the computation efficiency using parallelization. The above limitations hinder a wider application of AMR. This study provides a product-level parallel computing scheme for BIM data, which improves the capability of processing large-scale BIM data and lays a foundation for studies involving BIM. 2.2

BIM Triangulation

In IFC specification, the 3D geometric data are represented by at least one of the following models:

the Boundary Representation (Brep), the NURBS, the Constructive Solid Geometry (CSG), or the Swept Solid Model (SweptSolid) [9]. These technologies provide powerful abilities to describe all kinds of shapes in a compressed format. However, the hybrid 3D geometric representations make an IFC file unable to be directly rendered in many scenarios. For example, due to the limitations of OpenGL ES and WebGL, Web systems and mobile phone applications cannot support the rendering of complex solid geometry descriptions. This triggers the primary demand to triangulate the geometry in BIM data. The triangular mesh is widely accepted as the 3D shape data model. Moreover, triangles are simple geometric shapes that can be rendered directly in most software (especially 3D viewers). Some acceptable efforts [19, 20, 21, 22] have been made in BIM triangulation. IfcOpenShell [19], a widely used BIM triangulation tool, transforms 3D shapes from IFC into triangles. To enable rendering BIM data, Liu et al. [20] converted IFC files into the “.obj” format. Zhou et al. [21] also studied the transformation of IFC files into lightweight triangular meshes to facilitate WebGL-based online visualization system. Behr et al. [22] proposed a tool to triangulate CAD data (e.g., RVM) into ISO standards (e.g., X3D) to enable real-time visualization for plant lifecycle management. Weerasuriya et al. [23] extracted hybrid meshing of geometric models from BIM for Computational Fluid Dynamics and energy simulation. These tools enable the cross-platform visualization of BIM data. Employing parallel computing can help triangulate the BIM geometric data efficiently and enhance these applications.

3

Preliminaries and Definitions

3.1

Industrial Foundation Classes (IFC) IFC is a platform-neutral, open file format specification, which usually exists as files with the file

extension “.ifc”. As a standardized, digital description of the built asset industry, IFC is an international standard (ISO 16739-1:2018). Due to the ease of interoperability among software platforms, IFC has been widely accepted as a collaboration format in BIM-based projects, facilitating interoperability in AECO/FM industries. Specifically, IFC defines an entity-relationship model consisting of several hundred classes, which are organized into an object-based inheritance hierarchy. The elaborate class hierarchies and amount of branching make it not straightforward to process an IFC file in parallel. In other words, an IFC instance refers to and depends on other IFC instances. Take Fig. 1 as an example. An IFC instance starts from ‘#’, following with an instance number, and ends with ‘;’. Totally, 25 IFC instances exist in the IFC script in Fig. 1. The text “#45=IFCWALLSTANDARDCASE (‘3vB2YO$MX4xv5uCqZZG05x, #2, ‘Wall xyz’, ‘Description of Wall’, $, #46, #51, $);” describes an IFC instance of IfcWallStandardCase, where ‘#45’ is the instance number. The IFC instance numbering #45 is denoted as i45. Since IfcWallStandardCase is an IFC class inheriting from IfcProduct, i45 is also called a BIM product instance. Apparently, i45 refers to i2, i46,

and i51. Thus, i45 could be computed separately only if i2, i46 and i51 and the IFC instances they refer to were aggregated into the same data slice. Definition 1 (BIM Product, BIM Product Slice). A BIM product p of a BIM product instance i is a collection of IFC instances describing the whole information of i. BIM product is also called BIM product slice. Figure 1 presents a concrete example of a BIM product. Since all the 25 IFC instances describe the complete information of i45, they compose the BIM product p45. Usually, a BIM product has a shape representation and an object placement. For instance, i46 and i51 contain the definitions of object placement and shape representation of p45, respectively. To some extent, p45 is independent of other BIM products and can be computed and analyzed separately. It is to be noted that the related IFC instances of i2, i20, and i36 are not included in the example. Subsequently, to enable parallel computing of BIM data, a direct approach is to extract IFC products from the original IFC file. #45 = IFCWALLSTANDARDCASE('3vB2YO$MX4xv5uCqZZG05x', #2, 'Wall xyz', 'Description of Wall', $, #46, #51, $); #46 = IFCLOCALPLACEMENT(#36, #47); #47 = IFCAXIS2PLACEMENT3D(#48, #49, #50); #48 = IFCCARTESIANPOINT((0., 0., 0.)); #49 = IFCDIRECTION((0., 0., 1.)); #50 = IFCDIRECTION((1., 0., 0.)); #51 = IFCPRODUCTDEFINITIONSHAPE($, $, (#79, #83)); #79 = IFCSHAPEREPRESENTATION(#20, 'Axis', 'Curve2D', (#80)); #80 = IFCPOLYLINE((#81, #82)); #81 = IFCCARTESIANPOINT((0., 1.500E-1)); #82 = IFCCARTESIANPOINT((5., 1.500E-1)); #83 = IFCSHAPEREPRESENTATION(#20, 'Body', 'SweptSolid', (#84)); #84 = IFCEXTRUDEDAREASOLID(#85, #92, #96, 2.300); #85 = IFCARBITRARYCLOSEDPROFILEDEF(.AREA., $, #86); #86 = IFCPOLYLINE((#87, #88, #89, #90, #91)); #87 = IFCCARTESIANPOINT((0., 0.)); #88 = IFCCARTESIANPOINT((0., 3.000E-1)); #89 = IFCCARTESIANPOINT((5., 3.000E-1)); #90 = IFCCARTESIANPOINT((5., 0.)); #91 = IFCCARTESIANPOINT((0., 0.)); #92 = IFCAXIS2PLACEMENT3D(#93, #94, #95); #93 = IFCCARTESIANPOINT((0., 0., 0.)); #94 = IFCDIRECTION((0., 0., 1.)); #95 = IFCDIRECTION((1., 0., 0.)); #96 = IFCDIRECTION((0., 0., 1.));

BIM product instance i45

BIM product p45

Fig. 1. An example of IFC Instance, BIM Product Instance, and BIM Product.

Definition 2 (BIM Directed Reference Graph). BIM directed reference graph G = (V, E) is a directed graph, where V is the collection of IFC instances in an IFC file and E is the collection of reference

relationships among IFC instances in V. For ia, ib∈V, (ia, ib)∈E denotes a direct edge from ia to ib, which means that IFC instance ia refers to ib.

G reflects the reference relationships among IFC instances in an IFC file. The formation of G is fully discussed in Section 4.1.

3.2

MapReduce MapReduce has become the de facto standard for parallel computation at terabyte- or even

petabyte-scale [24, 25]. It is essentially a programming model with associated implementation for processing large datasets via a parallel and distributed algorithm on a cluster. As showed in Fig. 2, MapReduce can be considered as a five-step parallel and distributed computational program running in sequence. Input i45 i46 i51 i86 i87

Data Reader

Map

Shuffle

Reduce

i45 i46 i51

(i45, [i45, i46,i51 ])

(i45, [i45, i46,i51])

p45=[i45, i46,i51]

i86 i87 i51

(i86, [i86 ,i87, i51])

Output

Computing Results (i86, [i86 ,i87, i51])

p86=[i86 ,i87, i51]

Fig. 2. Main Processes of MapReduce.

Data reader step: The data reader step prepares input for the “Map” step. It designates Map processors and assigns the input key-value pair that each processor would work on. “Map” step: A map() function is applied to the local data, and the output is written to temporary storage. A master node ensures that only one of any set of redundant copies of input data is processed. “Shuffle” step: The data produced by the “Map” step is redistributed based on the output keys such that all data belonging to one key is located on the same worker node. “Reduce” step: A reduce() function is executed in parallel, taking each group of output data (per key) as the input. Output step: The output step collects all the Reduce output and sorts it to produce the final outcome. In common, these five steps can be logically thought of as running in sequence. Each step starts only after the previous step was completed. When processing BIM data, more efforts have to be made on the data reader step. The reason is that the complex reference relationships among IFC instances make an IFC instance impossible to be computed separately. By utilizing the MapReduce framework, a massive amount of data can be processed in parallel through writing scripts for the map() and reduce() functions. Hadoop [26] is a widely used open-source

implementation of MapReduce. This study shows how to compute BIM data in MapReduce by writing scripts for data reader step, as well as map() and reduce() functions.

4

Toward Parallel Computing by Extracting BIM Products Extracting BIM products is the first and fundamental step of parallel computing for BIM data.

Because it splits the original IFC file into independent product-level IFC slices (Fig. 3), each of which can be computed separately. Thus, more efforts are made on the data reader step in MapReduce than other steps in this study. The product-level BIM data segmentation is divided into two steps: the DrGraph model and the ProSlice model. The DrGraph model captures the reference relationships among the IFC instances and facilitates the ProSlice. The ProSlice model generates independent product-level BIM data slices on top of DrGraph.

Fig. 3. Extracting BIM products from the original IFC file.

4.1

DrGraph Model The DrGraph model aims to extract reference relationships among IFC instances into the directed

reference graph G = (V, E). Undoubtedly, G presents the basic foundation to extract all the related IFC instances of a BIM product. Originally, G has no vertices and edges. That is, V = ϕ and E = ϕ. Then, the DrGraph model reads each IFC instance and parses the reference relationships. Once a new IFC instance is detected, a new vertex will be added into V in G. Similarly, a new directed edge is added into E in G after a new reference relationship is observed. The formation of G using the example in Fig. 1 is illustrated as below. (1) When i45 is read, it is easy to know that i45 refers to i2, i46, and i51. Then, all the four IFC instances appear in i45 are added to V and the reference relationships among them are added to E. As a result, V = { i45, i2, i46, i51 } and E = { (i45, i2), (i45, i46), (i45, i51) }.

(2) When i46 is read, i46 points to i36 and i47. Since i46 is already in V, only i36 and i47 are added to V. Meanwhile, two edges (i46, i36) and (i46, i47) are added to E. Thus, V = { i45, i2, i46, i51, i36, i47 } and E = { (i45, i2), (i45, i46), (i45, i51), (i46, i36), (i46, i47) }. (3) When i47 is read, i47 points to i48, i49, and i50. Similarly, i48, i49 and i50 are added to V, and V = { i45, i2, i46, i51, i36, i47, i48, i49, i50 }. Three new edges (i47, i48), (i47, i49) and (i47, i50) are collected into E, and E = { (i45, i2), (i45, i46), (i45, i51), (i46, i36), (i46, i47), (i47, i48), (i47, i49), (i47, i50) }. (4) When i48, i49, and i50 are read, no new vertices and edges are detected. Thus, V and E are unchanged. Figure 4 intuitively presents the formation of G after the processing of i45, i46, i47, i48, i49, and i50. In this way, when all the IFC instances are read and parsed, a complete G is constructed. Although we illustrated the construction of G from i45, it is obvious that the same G can be obtained starting from any other IFC instances. G of the example IFC script in Fig. 2 is finally obtained as: V = { i45, i2, i46, i51, i36, i47, i48, i49, i50, i51, i79, i83, i80, i81, i82, i83, i84, i85, i86, i87, i88, i89, i90, i91, i92, i93, i94, i95, i96 } and E = { (i45, i2), (i45, i46), (i45, i51), (i46, i36), (i46, i47), (i47, i48), (i47, i49), (i47, i50), (i51, i79), (i51, i83), (i79, i20), (i79, i80), (i80, i81), (i80, i82), (i83, i20), (i83, i84), (i84, i85), (i84, i92), (i84, i96), (i85, i86), (i86, i87), (i86, i88), (i86, i89), (i86, i90), (i86, i91), (i92, i93), (i92, i94), (i92, i95) }. Figure 5 presents G intuitively. Figure 6 presents the complete G of the example IFC file from buildingSMART1. Obviously, the IfcProject instance i1 is the root of G. Since the IFC descriptions in Fig. 1 is a part of that in the example IFC file, the graph in Fig. 5 is a subgraph of that in Fig. 6, which roots from i45. Algorithm 1 summarizes the whole process of the DrGraph model. Line 2 initializes G. Lines 4-6 add the current IFC instance into V of G, while lines 7-12 collect the reference relationships defined in the current IFC instance. Algorithm 1 utilizes a dictionary structure to check whether an IFC instance i is already in V or not. Thus, lines 4-6 cost O(1) in time complexity. Since an IFC instance i refers to limited IFC instances, the time complexity lines 7-12 can be considered as O(1). In sum, lines 4-12 cost O(1) and Algorithm 1 works at O(|V|) in time complexity, where |V| is the number of IFC instances in IFC file f. i45 i2

i45 i2

i45 i2

i46

i51

(a) i45 is read.

i46

i51

i46

i36

i45 i51

i36

i47

i47

i48 i49

(b) i46 is read.

(c) i47 is read.

i36

i2

http://www.buildingsmart-tech.org/implementation/get-started/hello-world

i51 i47

i48 i49

i50

i50

(d) i48, i49, and i50 are read.

Fig. 4. Illustration of formation of Directed Reference Graph.

1

i46

Fig. 5. Directed reference graph for the IFC example in Fig. 1. Algorithm 1: DrGraph – Construct Directed Reference Graph of an IFC file Input: IFC file f Output: Directed reference graph G 1: function DrGraph(f) 2: G = (V, E), V = ϕ, E = ϕ, V_D = dict(). 3: for each IFC instance i in f: 4: if V_D does not contain key i do: 5: V = V + {i}, V_D[i] = 1. 6: end if 7: for each IFC instance j which is cited by i: 8: E = E + { (i, j) }. 9: if V_D does not contain key j do: 10: V = V + {j}, V_D[j] = 1. 11: end if 12: end for 13: end for 14: return G

4.2

ProSlice Model ProSlice model aims to generate all the BIM products on top of directed reference graph G. With G,

ProSlice works mainly in two steps: Instance Collection and Slice Formation. Instance Collection collects all the related IFC instances for a BIM product according to G. Slice Formation outputs all the IFC instances of a BIM product. (1) Instance Collection. With G, ProSlice starts from BIM Product instances. Given a BIM Product instance i, its counterpart of BIM product p contains only i at the beginning. ProSlice adds the dependent IFC instances that i points to in G and marks i as traversed. When a new IFC instance is added to p, it is marked as traversed after its reference IFC instances are all added to p. Iteratively, all the IFC instances

related to p are collected. Take Fig. 4 as an example. i45 is a BIM product instance. Initially, i45 is added to p45, and p45 = { i45 }. Since i45 points to i46 and i51, i46 and i51 are added to p45 and i45 is marked as traversed. Thus, p45 = { i45, i46, i51 }. Since i46 is not traversed, i36 and i47, which are the IFC instances i46 points to, are added to p45 and i46 is marked as traversed. At this time, p45 = { i45, i46, i51, i36, i47}. Iteratively, p45 = { i45, i2, i46, i51, i36, i47, i48, i49, i50, i51, i79, i83, i80, i81, i82, i83, i84, i85, i86, i87, i88, i89, i90, i91, i92, i93, i94, i95, i96, i30, i31, i32, i33, i34, i37, i38, i39, i40, i24, i25, i26, i27, i28 } is obtained ultimately.

Fig. 6. Directed reference graph for an IFC file from buildingSMART.

(2) Slice Formation. After instance collection, the IFC instance numbers of each BIM product is collected. The slice formation step generates the concrete details of each IFC instance of a BIM product into the same separate IFC slice. Moreover, the basic format of IFC file like IFC header definitions should also be included in each separate BIM product IFC slice. Algorithm 2 summarizes the whole process of the ProSlice model. The ProSlice model has two steps for each BIM product instance. Line 4 is the instance collection step and line 5 is the slice formation step. Lines 8-17 present a depth-first-search approach [27] to collect all the related IFC instances. Lines 18-27 demonstrate the detailed steps of slice formation. Let | p | denote the number of IFC instances in p, the instance collection process costs O(| p |) in time complexity due to the efficiency of tree traversing. The time complexity of the slice formation process is also O(| p |). To sum up, the total time complexity of data segmentation for p is O(| p |). Subsequently, the time complexity of the ProSlice model is O(N ×| p |) ≤ O(| V |), where N is the number of BIM products in the IFC file. Algorithm 2: ProSlice – Extract BIM Product IFC Files Input: IFC file f Output: BIM product IFC slices 1: function ProSlice(f) 2: G = DrGraph(f). 3: for each BIM product instance i in G: 4: p = { i } + InstanceCollection(G, i). 5: InstanceRewritten(f, p). 6: end for 7: return. 8: function InstanceCollection(G, i): 9: q = { } 10: Let N(i) be IFC instance collection i points to. 11: if N(i) = ϕ: 12: return ϕ. 13: q = N(i). 14: for each j in N(i): 15: q = q + InstanceCollection(G, j). 16: end for 17:return q. 18:function SliceFormation(f, p): 19: Create a slice s for BIM Product p. 20: Write IFC header to s. 21: Write “DATA;” to s. 22: for each i in p: 23: Write the content of i from f to s. 24: end for 25: Write “ENDSEC;” to s. 26: Write end ISO to s. 27:return s.

4.3

ProSlice in MapReduce This subsection presents the usage of ProSlice in MapReduce to enable parallel computing of BIM

data. To ensure that each map() function receives an independent data slice, the data reader step has to be customized. Hadoop supports processing varieties of formats and types of data through extending abstract class InputFormat. InputFormat emits the key-value pair inputs for map() function by parsing the input data. Specifically, InputFormat splits the input file into logical data slices and provides the RecordReader implementation to glean input records from the logical data slices for processing by map() functions. Once the input data is partitioned, Hadoop invokes a map() function for each logical data slice. In this study, we wrote an implementation of InputFormat to support BIM data as inputs to Hadoop MapReduce computations. Figure 7 presents the workflow of product-level parallel computing of BIM data in MapReduce using ProSlice. Firstly, abstract class InputFormat is extended, in which createRecordReader() method returns an RecordReader object. The RecordReader object employs ProSlice to split the original IFC file into several independent small IFC data slices in the initialize() function. Each data slice contains all the IFC instances defining a BIM product and is assigned a key with the instance number of the BIM product. The RecordReader object utilizes a dictionary structure to store all the key-value pairs of IFC data slices. Then, Hadoop generates a map task for each IFC data slice and invokes the respective mappers with the key-value pair as the input. The map() function directly outputs , while the reduce() function executes the product-level algorithms over each BIM product and emits the results.

Since only BIM products contain geometric data, the BIM triangulation task is extremely suitable to evaluate the power of product-level parallel computing. We illustrated the usage of ProSlice in MapReduce using the BIM triangulation task. Algorithm 3 summarizes the whole process of the MapReduce version of BIM triangulation, termed MR-BIMTri. Lines 1-6 show the usage of ProSlice in MapReduce. Obviously, ProSlice is executed in the initialize() function of RecordReader class. Lines 7 and 8 are the implementation of map() function in Mapper class. The map() function directly emits the input key-value pair . Lines 9 and 10 are the reduce() function in Reducer class. The reduce() function computes and emits the triangular meshes for each BIM product. Undoubtedly, other product-level data mining tasks can be upgraded to a parallel computing version by simply placing corresponding algorithms in line 10. When

triangulating BIM geometric data, each data slice must also include the BIM product instantiating from IfcProject. Because IfcProject defines the default units, the geometric representation context for the shape representations, the global positioning of the project coordinate system and some other basic information.

Equipped with ProSlice

RecordReader: Initialize()

IFC File f

map()

map()



map()

Equipped with product-level algorithms



reduce()



Result of p1 Result of p2

reduce() ……

…… map()

reduce()

Result of p3

Result of pk

reduce()

Fig. 7. Product-level parallel computing of BIM data in MapReduce using ProSlice.

Algorithm 3: MR-BIMTri – Triangulate BIM geometric data using MapReduce Input: IFC file f Output: Triangular mesh data for each BIM product 1: function RecordReader: initialize(f): 2: pros = ProSlice(f). 3: for each p in pros: 4: i = instance number of p. 5: store using dictionary structure. 6: end for 7: function Mapper: map: 8: emit . 9: function Reducer: reduce: 10: Triangulate p and emit the triangular mesh data.

5

Experiments This section systematically evaluated the performance of our proposed parallel computing scheme

for BIM data using the BIM triangulation task. 5.1

Performance with Baselines

5.1.1 Experiment Setups Indices. Computation resource requirement and computation efficiency are two critical factors to measure the performances of parallel computing. Usually, the computation resource requirement is attributed to the number of IFC instances expressing 3D shapes in BIM triangulation, and the running time directly reflects the computation efficiency. Thus, the number of IFC instances expressing 3D shapes and the running time are taken to evaluate the performances of MR-BIMTri. Baselines. IfcOpenShell [19] is a representative BIM triangulation tool. We embedded IfcOpenShell

in both MR-BIMTri and AMR to triangulate BIM geometric data. IfcOpenShell is taken as the first baseline to illustrate the BIM triangulation time without parallel computing. AMR [12] is the only effort in adopting MapReduce to process BIM data. Thus, AMR is selected as the second baseline of our proposed MR-BIMTri. Undoubtedly, IfcOpenShell can be replaced by any other BIM triangulation tools in the experiments. Datasets. In the testing process, we utilized extensive BIM models online 2 to evaluate the performance of our proposed product-level parallel computing framework. In this study, we selected four typical BIM models to present the performance of MR-BIMTri. Table 1 lists the statistics of the four BIM models. Their sizes are 47.2M, 88.3M, 181.2M and 356.4M. They have 631, 5,410, 39,645 and 17,747 BIM products with 3D shapes, and 3, 12, 58 and 6 IfcBuildingStorey instances, respectively. Their numbers of IFC instances increases from 919,572 to 6,416,162. After triangulation, the geometric data of the four BIM models include 1,444,681, 1,457,086, 4,969,047 and 33,118,760 triangular meshes, respectively. Figure 8 presents the 3D shapes of the four BIM models. Experiment Environment. Both AMR and MR-BIMTri were implemented in Hadoop 2.7.2 using Java programming language. IfcOpenShell is the triangulation engine in both AMR and MR-BIMTri. It is noticing that AMR may fail to compute BIM products that are not contained in any IfcBuildingStorey instances. Table 1. BIM models

1

47.2

# of BIM products with 3D Shapes 631

2

88.3

5,410

1,694,433

12

1,457,086

3

181.2

39,645

2,746,559

58

4,969,047

4

356.4

17,747

6,416,162

6

33,118,760

#

2

Size (M)

https://www.bos.xyz

# of IfcBuildingStorey instances

# of IFC instances

# of Triangular Meshes

919,572

3

1,444,681

Descriptions One-floor office model with furniture Architectural model of a 12-floor office building Architectural model of a 42-floor building MEP model of a 6-floor plant

(a) 47.2M BIM model

(b) 88.3M BIM model

(c) 181.2M BIM model

(d) 356.4M BIM model

Fig. 8. 3D shapes of BIM models for experiments.

5.1.2 Experiment Results AMR can only split the original four BIM models into 3, 12, 58 and 6 IFC sub-files, respectively. To make an unbiased comparison with AMR, MR-BIMTri also re-organized the BIM products in the four BIM models into 3, 12, 58 and 6 IFC sub-files, respectively. MR-BIMTri employed two strategies to assemble the BIM products, namely random assignment and type preference. Random assignment allocates BIM products randomly into different sub-files, while type preference prefers to assign BIM products with the same IFC type into the same sub-file. Both random assignment and type preference strategies maintain the IFC sub-files with similar sizes. In the experiments, both AMR and MR-BIMTri were executed with five reducers in MapReduce. The running times of IfcOpenShell, AMR, MR-BIMTri using random assignment and type preference strategies are listed in Table 2. Obviously, MR-BIMTri using type preference performed the best in all the four BIM models, followed by MR-BIMTri using random assignment. AMR performed much more efficiently than IfcOpenShell in 2#, 3# and 4# BIM models. These observations proved that adopting parallel computing could improve computational efficiency when processing BIM data. Since

DrGraph was utilized to split the original IFC file in both AMR and MR-BIMTri, AMR and MR-BIMTri cost almost the same splitting time at the same BIM model. It is noticing that AMR required the most time to triangulate the geometric data in the 1# BIM model. Because most of the BIM products were dispatched into the same sub-file. Meanwhile, AMR requires extra time to split the original IFC file. We also observed that MR-BIMTri under both random assignment and type preference strategies had less running time than AMR in all the four BIM models. It is because that MR-BIMTri can divide the original IFC file into several IFC sub-files with similar sizes on top of ProSlice. Contrarily, AMR cannot regularize the size of each split IFC file, since the relationships between BIM products and IfcBuildingStorey instances are determinate for a given IFC file. We also found that MR-BIMTri using type preference strategy had a significant improvement in efficiency than using random assignment in all the four BIM models. Evidently, MR-BIMTri using type preference reduced 63.99%, 74.61%, 72.02% and 71.48% of the running times in IfcOpenShell, and 67.51%, 15.67%, 50.44% and 48.16% of those in AMR in the four BIM models, respectively. The reason is that the BIM products with the same type may share the same 3D representation. As a result, type preference strategy can mitigate the issue of re-triangulating the same geometric data, which existed in both AMR and MR-BIMTri using random assignment. From this perspective, we hold that product-level parallelism is more flexible than floor-level parallelism. Table 3 shows the maximal number of IFC instances loaded into memory using IfcOpenShell, AMR, and MR-BIMTri. A larger number of IFC instances require more computation resources (e.g., internal memory) as well as more running time to compute the triangular mesh. Undoubtedly, the less IFC instances loaded into memory, the better the performance is. It is intuitive to find that the maximal numbers of IFC instances loaded into memory were all less than the numbers of all IFC instances in the four BIM models. Because the IFC instances, which do not express geometric data, are removed. Apparently, IfcOpenShell possessed the largest numbers in the maximal number of IFC instances loaded into memory in all the four IFC files. It is because that IfcOpenShell has to load all the IFC instances describing geometric data at a time. MR-BIMTri using both random assignment and type preference performed better than AMR in this index. The basic reason is that BIM products with the same shapes are usually expressed by the same IFC instances and many floors usually share the BIM products with the same shapes. As a result, AMR dispatched the same IFC instances into several IFC sub-files. MR-BIMTri using random assignment may have the same phenomenon. Contrarily, MR-BIMTri using type preference resolve this issue by putting the BIM products with the same IFC type into the same IFC sub-file. Subsequently, MR-BIMTri using type preference had the least numbers in this index. As showed in Table 3, the maximal numbers of IFC instances loaded into memory in MR-BIMTri using type preference were only 49.90%, 10.20%, 2.41% and 22.56% of those in IfcOpenShell in the four BIM models.

Referred to AMR, MR-BIMTri using type preference could reduce the maximal numbers of IFC instances loaded into memory by 48.80%, 61.36%, 87.73% and 68.73% in the four BIM models, respectively. These observations conveyed that MR-BIMTri using type preference could reduce the computation resources (e.g., internal memory) in AMR by as much as 87.73% in the 3# BIM model. From the empirical results, we showed that MR-BIMTri had the best performance and IfcOpenShell ranked the last. These observations proved that parallel computing can improve the processing performance and that our product-level parallel computing is more suitable than floor-level parallel computing in the BIM triangulation task. Table 2. Running time

BIM Model 1# 2# 3# 4#

IfcOpenShell 240.91 346.93 755.28 2629.40

AMR (Split Time) 267.04 (14.88) 104.44 (16.56) 426.34 (33.52) 1446.75 (149.78)

MR-BIMTri (ProSlice) Random Assignment 103.45 (14.62) 94.42 (17.30) 276.15 (29.30) 856.82 (156.83)

Type Preference 86.75 (14.74) 88.07 (16.78) 211.31 (30.89) 750.02 (151.12)

Table 3. Maximal Number of IFC instances loaded into memory

BIM Model

IfcOpenShell

1# 2# 3# 4#

836,961 1,669,207 1,764,194 5,664,573

5.2

Effect of Parameters

AMR 815,646 440,582 346,560 4,086,594

MR-BIMTri (ProSlice) Random Assignment Type Preference 558,891 240,640 149,392 2,711,670

417,643 170,259 42,517 1,277,928

Some parameters may affect the performance of parallel computing. We also investigated the effects of the number of IFC sub-files and the number of reducers on MR-BIMTri. We varied the number of IFC sub-files to capture its effect on the running time of MR-BIMTri. Figure 9 shows the trends of running time along with the number of IFC sub-files. The experiments were conducted with five reducers. As illustrated in Fig. 9(a), the running time increased along with the number of IFC sub-files in MR-BIMTri using random assignment. In the 1# BIM model, the numbers of the sub-files were 5, 30, 62, 68 and 76, the running times were 103.45, 155.09, 163.16, 168.69 and 170.61 seconds, respectively. Apparently, 67.15 more seconds were required when the number of sub-file increased from 5 to 76 in the 1# BIM model. The running times in the 2# BIM model were 91.09, 94.43, 95.67, 103.55 and 122.65 seconds, when the numbers of sub-files were set 6, 10, 22, 35, 47 and 62. MR-BIMTri using random assignment strategy cost 31.56 more seconds with 62 sub-files than 6 sub-files in the 2# BIM model. The 3# BIM model observed 210.06, 244.93, 263.77, 276.15 and 327.70 seconds in

running time with 6, 25, 41, 50 and 68 sub-files, respectively. Similarly, the running time increased along with the number of sub-files in the 3# BIM model. In the 4# BIM model, the running times were 856.82, 1080.14, 1259.68, 1338.31, 1404.18 and 1630.38 seconds when splitting the BIM model into 5, 18, 30, 43, 52 and 87 sub-files. Evidently, the running time soared as much as 90.28% when the number of sub-file rose from 5 to 87 in the 4# BIM model. Contrarily, the running times of MR-BIMTri using type preference stayed relatively stable, as showed in Fig. 9(b). For instance, the 1# BIM model cost 86.75 seconds with 5 sub-files, 89.35 seconds with 30 sub-files, 86.67 seconds with 62 sub-files, 89.27 seconds with 68 sub-files and 85.70 seconds with 76 sub-files. In the 4# BIM model, the running times were 750.02, 750.02, 727.52, 705.67, 670.41 and 670.41 seconds with 5, 18, 30, 43, 52 and 87 sub-files, respectively. A similar phenomenon was also found in the 2# and 3# BIM models using type preference policy. It is because that the IFC instances describing the BIM products with the same shapes may be dispatched into more IFC sub-files using random assignment strategy, resulting in more running time with the same number of reducers. Meanwhile, the type preference strategy mitigates this issue by putting BIM products with the same shapes into the same IFC sub-file. The number of reducers has a significant influence on the running time in MapReduce. More reducers can enable the computation in more processes, which can reduce the running time. Figure 10 presents the effect of reducers on the running time. We increased the number of reducers from 1 to 7 by 2. Obviously, MR-BIMTri using both random assignment and type preference strategies can triangulate the geometric data in all the four BIM models more efficiently with more reducers. As showed in Fig. 10(a), the triangulation time of the 4# BIM model using random assignment strategy cost 6363.19 seconds with 1 reducer,

2230.68 seconds with 3 reducers, 1404.18 seconds with 5 reducers and 1049.96 seconds with

7 reducers. MR-BIMTri using type preference policy required 3043.67, 1135.51,

723.67

and

571.85

seconds to triangulate the 4# BIM model with 1, 3, 5 and 7 reducers, respectively. Compared to a single computation unit, MR-BIMTri using both random assignment and type preference strategies observed a dramatic decrease in running time with 3 reducers. As more reducers were added, the running time declined using either random assignment or type preference strategy. Similar decreases also happened to the other three BIM models. This further proves that the adoption of parallel computing can scale the computation of BIM data.

1800

900

1200

Running time (s)

Running time (s)

1500 1# 2# 3# 4#

900 600

600 1# 2# 3# 4#

300

300 0

0 0

30

60

90

0

Number of IFC sub-files

30

60

90

Number of IFC sub-files

(a) Random assignment

(b) Type preference

Fig. 9. Effect of the number of IFC sub-files on running time. 7000

5000 4000

1# 2# 3# 4#

3000

Running time (s)

6000

Running time (s)

3500

1# 2# 3# 4#

3000 2000 1000

2500 2000 1500 1000 500

0

0 0

2

4

Number of reducers (a) Random assignment

6

8

0

2

4

6

8

Number of reducers (b) Type preference

Fig. 10. Effect of the number of reducers on running time.

6

Conclusions This study addressed the problem of processing large-scale BIM data by proposing algorithms to

enable product-level parallel computing. Processing large-scale BIM data in parallel is still an open issue due to the complicated reference relationships in a BIM. We firstly employed graph theory to model the complicated reference relationships among IFC instances and proposed the Directed Reference Graph (DrGraph) model. Then, we developed the BIM Product Slice (ProSlice) model, which is a product-level

segmentation algorithm of an IFC file on top of DrGraph. Finally, we illustrated the usage of ProSlice in MapReduce. We also conducted experiments using BIM triangulation task to evaluate the performances of our proposed parallel computing scheme for BIM data. Although this study demonstrated the ability of parallel computing for large-scale BIM data in MapReduce, it is a handful to implement a version of

other parallel computing frameworks, e.g. message passing interface [28], Spark [29]. The results of our empirical experiments revealed that our product-level parallel computing scheme was more suitable than floor-level ones. Compared to the floor-level method, our product-level parallel computing scheme could reduce as much as 67.51% of the computation time and 87.73% of the computation resources. We also showed that parallel computing scaled the computation of BIM data, because the computation efficiency of BIM data could be improved by equipping with more reducers in MapReduce. The innovation of this study was that a novel product-level parallel computing scheme using graph theory for BIM data was presented. Parallel computing is a promising technology to improve the efficiency of analyzing, processing and mining of large-scale BIM data. However, the highly dependency among IFC instances brings challenges for the adoption of parallel computing frameworks. This study presented the ability to apply parallel computing for BIM data by using graph theory. It is anticipated that more data mining tasks involving BIM can be inspired by the proposed scheme and upgraded to a parallel computing version. In future works, a more generalized parallel computing scheme for BIM data will be developed by extending from IfcProduct to other levels like IfcActor, IfcControl, and IfcProcess. Additionally, our proposed parallel framework will be employed to scale more data mining tasks involving BIM, e.g., semantic enrichment of BIM products [30], product-level BIM comparison [31].

Acknowledgements This work was supported by the National Natural Science Foundation of China under grant nos. 71601013 and 41971350, the Beijing Natural Science Foundation under grant no. 4202017, the Youth Talent

Support

Program

of

Beijing

Municipal

Education

Commission

under

grant

no.

CIT&TCD201904050, the Science Research Project of Beijing Municipal Education Commission under grant no. KM201810016013, the Youth Talent Project of Beijing University of Civil Engineering & Architecture (BUCEA), and the Fundamental Research Funds for BUCEA under grant nos. X18050 and KYJJ2017024.

References [1] R. Vanlande, C. Nicolle, C. Cruz, Ifc and building lifecycle management, Automation in Construction, 2009, 18(1):70-78, https://doi.org/10.1016/j.autcon.2008.05.001. [2] R. Eadie, M. Browne, H. Odeyinka, C. McKeown, S. McNiff, BIM implementation throughout the UK construction project lifecycle: An analysis, Automation in Construction, 2013, 36(Complete):145-151, https://doi.org/10.1016/j.autcon.2013.09.001.

[3] J.R. Lin, Z.Z. Hu, J.P. Zhang, F.Q. Yu, A natural‐language‐based approach to intelligent data retrieval and representation for Cloud BIM, Computer‐Aided Civil and Infrastructure Engineering, 2016, [4] [5]

[6]

[7]

[8] [9]

[10]

[11]

[12]

[13] [14]

[15]

31(1): 18-33, https://doi.org/10.1111/mice.12151. M. Oh, J. Lee, S.W. Hong, Y. Jeong, Integrated system for BIM-based collaborative design[J], Automation in Construction, 2015, 58:196-206, https://doi.org/10.1016/j.autcon.2015.07.015. A. GhaffarianHoseini, T. Zhang, O. Nwadigo, A. GhaffarianHoseini, N. Naismith, J. Tookey, K. Raahemifar, Application of nD BIM Integrated Knowledge-based Building Management System (BIM-IKBMS) for inspecting post-construction energy efficiency. Renewable and Sustainable Energy Reviews, 2017, 72, 935-949, https://doi.org/10.1016/j.rser.2016.12.061. E. A. Pärn, D. J. Edwards, M. C. P. Sing, The building information modelling trajectory in facilities management: A review, Automation in Construction, 2017, 75: 45-55, https://doi.org/10.1016/j.autcon.2016.12.003. I. Ha, H. Kim, S. Park, H. Kim, Image retrieval using BIM and features from pretrained VGG network for indoor localization. Building and Environment, 2018, 140, 23-31, https://doi.org/10.1016/j.buildenv.2018.05.026. A.A. Volkov, E.I. Batov, Dynamic extension of building information model for “smart” buildings. Procedia Engineering, 2015, 111, 849-852, https://doi.org/10.1016/j.proeng.2015.07.157. X. Zhou, J. Zhao, J. Wang, D. Su, H. Zhang, M. Guo, Z. Li, OutDet: an algorithm for extracting the outer surfaces of building information models for integration with geographic information systems. International Journal of Geographical Information Science, 2019, 33(7), 1444-1470, https://doi.org/10.1080/13658816.2019.1572894. T. Cerovsek, A review and outlook for a “Building Information Model”(BIM): A multi-standpoint framework for technological development, Advanced Engineering Informatics, 2011, 25(2): 224-244, https://doi.org/10.1016/j.aei.2010.06.003. K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, D. Wessel, A view of the parallel computing landscape, Communications of the ACM, 2009, 52(10), 56-67, https://doi.org/ 10.1145/1562764.1562783. Y. Jiao, S. Zhang, Y. Li, Y. Wang, An augmented MapReduce framework for Building Information Modeling applications, IEEE International Conference on Computer Supported Cooperative Work in Design, IEEE, 2014, https://doi.org/10.1109/CSCWD.2014.6846856. P. Erdös, Graph theory and probability. Canadian Journal of Mathematics, 1959, 11, 34-38, https://doi.org/10.4153/CJM-1959-003-9. buildingSMART, “Industry foundation classes IFC2x edition 3 technical corrigendum 1,” buildingSMART, [online] Available: http://www.buildingsmart-tech.org/ifc/IFC2x3/TC1/html (Last accessed: 11/5/2018). M. Bilal, L.O. Oyedele, J. Qadir, K. Munir, S.O. Ajayi, O.O. Akinade, H.A. Owolabi, H.A. Alaka, M. Pasha, Big Data in the construction industry: A review of present status, opportunities, and future trends, Advanced Engineering Informatics, 2016, 30(3): 500-521, https://doi.org/10.1016/j.aei.2016.07.001.

[16] J.R. Lin, Z.Z. Hu, J.P. Zhang, F.Q. Yu, A natural‐language‐based approach to intelligent data retrieval and representation for Cloud BIM, Computer‐Aided Civil and Infrastructure Engineering, 2016, 31(1): 18-33, https://doi.org/10.1111/mice.12151. [17] H.M. Chen, K.C. Chang, T.H. Lin, A cloud-based system framework for performing online viewing,

[18]

[19] [20] [21]

[22]

[23]

[24] [25]

[26] [27] [28]

[29]

[30]

[31]

storage, and analysis on big data of massive BIMs. Automation in Construction, 2016, 71, 34-48, https://doi.org/ 10.1016/j.autcon.2016.03.002. K. Ibrahim, H. Abanda, C. Vidalakis, and G. Wood, Bim big data system architecture for asset management: a conceptual framework, In Proceedings of the Joint Conference on Computing in Construction (JC3), 2017, 289-296. IfcOpenShell, [online] Available: http://ifcopenshell.org (Last accessed: 11/5/2018). X. Liu, N. Xie, K. Tang, J. Jia, Lightweighting for Web3D visualization of large-scale BIM scenes in real-time, Graphical Models, 2016, 88: 40-56, https://doi.org/10.1016/j.gmod.2016.06.001. X. Zhou, J. Wang, M. Guo, Z. Gao, Cross-platform online visualization system for open BIM based on WebGL, Multimedia Tools and Applications, 2019, 78(20), 28575–28590, https://doi.org/10.1007/s11042-018-5820-0. J. Behr, C. Mouton, S. Parfouru, J. Champeau, C. Jeulin, M. Thöner, C. Stein, M. Schmitt, M. Limper, M. de Sousa, T. A. Franke, G. Voss, webVis/instant3DHub: visual computing as a service infrastructure to deliver adaptive, secure and scalable user centric data visualisation, Proceedings of the 20th International Conference on 3D Web Technology, ACM, 2015: 39-47, https://doi.org/10.1145/2775292.2775299. A.U. Weerasuriya, X. Zhang, V.J. Gan, Y. Tan, A holistic framework to utilize natural ventilation to optimize energy performance of residential high-rise buildings. Building and Environment, 2019, 153, 218-232, https://doi.org/10.1016/j.buildenv.2019.02.027. J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, 2008, 51(1): 107-113, https://doi.org/10.1145/1327452.1327492. X. Zhou, X. Liang, Z. Tang, Scalable Triangle Discovery Algorithm for Large-scale Scale-free Network with Limited Internal Memory, IEEE Transactions on Big Data, 2019, https://doi.org/10.1109/TBDATA.2018.2889120. Apache Hadoop, [online] Available: http://hadoop.apache.org (Last accessed: 11/5/2018). R. Tarjan, Depth-first search and linear graph algorithms, SIAM journal on computing, 1972, 1(2), 146-160, https://doi.org/ 10.1109/SWAT.1971.10. W. Gropp, E. Lusk, N. Doss, A. Skjellum, A high-performance, portable implementation of the MPI message passing interface standard, Parallel Computing, 1996, 22(6): 789-828, https://doi.org/10.1016/0167-8191(96)00024-5. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, Spark: Cluster computing with working sets, Usenix Conference on Hot Topics in Cloud Computing, 2010, 10(10-10): 95, https://doi.org/10.1016/j.na.2007.01.001. Z. Han, M. Shang, Z. Liu, C.M. Vong, Y.S. Liu, M. Zwicker, J. Han, C.P. Chen, Seqviews2seqlabels: Learning 3d global features via aggregating sequential views by RNN with attention. IEEE Transactions on Image Processing, 2018, 28(2), 658-672, https://doi.org/10.1109/TIP.2018.2868426. X. Shi, Y.S. Liu, G. Gao, M. Gu, H. Li, IFCdiff: A content-based automatic comparison approach for IFC files. Automation in Construction, 2018, 86, 53-68, https://doi.org/10.1016/j.autcon.2017.10.013.

Highlights 1. 2. 3. 4. 5.

Graph theory can model the complicated reference relationships of BIM data. Parallel computing (PC) scales the computation of BIM data. A product-level PC framework using MapReduce for BIM is presented. Product-level PC can reduce the running time of floor-level ones by 67.51%. Product-level PC can reduce computation resources of floor-level ones by 87.73%.

Declarations of interest: none