Data mining-based damage identification of a slab-on-girder bridge using inverse analysis

Data mining-based damage identification of a slab-on-girder bridge using inverse analysis

Journal Pre-proofs Data mining-based damage identification of a slab-on-girder bridge using inverse analysis Meisam Gordan, Zubaidah Ismail, Hashim Ab...

5MB Sizes 0 Downloads 26 Views

Journal Pre-proofs Data mining-based damage identification of a slab-on-girder bridge using inverse analysis Meisam Gordan, Zubaidah Ismail, Hashim Abdul Razak, Khaled Ghaedi, Zainah Ibrahim, Zhi Xin Tan, Haider Hamad Ghayeb PII: DOI: Reference:

S0263-2241(19)31041-3 https://doi.org/10.1016/j.measurement.2019.107175 MEASUR 107175

To appear in:

Measurement

Received Date: Revised Date: Accepted Date:

8 June 2019 16 October 2019 18 October 2019

Please cite this article as: M. Gordan, Z. Ismail, H. Abdul Razak, K. Ghaedi, Z. Ibrahim, Z. Xin Tan, H. Hamad Ghayeb, Data mining-based damage identification of a slab-on-girder bridge using inverse analysis, Measurement (2019), doi: https://doi.org/10.1016/j.measurement.2019.107175

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier Ltd.

Data mining-based damage identification of a slab-on-girder bridge using inverse analysis Meisam Gordan a*, Zubaidah Ismail a*, Hashim Abdul Razak a, Khaled Ghaedib, Zainah Ibrahimb, Zhi Xin Tanc, and Haider Hamad Ghayebb a StrucHMRSGroup,

Department of Civil Engineering, University of Malaya, 50603 Kuala

Lumpur, Malaysia b Department of Civil Engineering, University of Malaya, 50603 Kuala Lumpur, Malaysia c School of Civil Engineering and Built Environment, Queensland University of Technology, QLD 4001, Australia *Corresponding Author’s email: [email protected], [email protected]

Abstract Classical damage detection methods such as visual inspections have many limitations, i.e. time consuming procedure, costly process and ineffective for large and complex structural systems. To overcome these difficulties, a data mining-based damage identification approach is developed in this study. First four natural frequencies which obtained from the experimental modal analysis of a slab-on-girder bridge structure are used as an input database. The laboratory work is carried out through single-type and multiple-type damage scenarios. The applicability of machine learning, artificial intelligence and statistical data mining techniques are here examined using Support Vector Machine (SVM), Artificial Neural Network (ANN) and Classification and Regression Tree (CART) to predict the model behavior and damage severity. Then, a hybrid algorithm is proposed in the deployment step of Cross Industry Standard Process for Data Mining (CRISP-DM) model. According to the obtained results, the hybrid algorithm performs a better accuracy in compare to ANN technique itself. Keywords: Data mining-based damage identification; experimental modal analysis; Cross Industry Standard Process for Data Mining; imperial competitive algorithm; hybrid algorithm 1

Nomenclature ANN b, bi c CRISP-DM CART DM FRF f(x, ω) fi Fi i(t) ICA ICA-ANN L(x,ф) MLP MAE n gi p(j|t) SVM SHM Yi,y xi wij ωrpe ωexp ωnum β ɛ ξi Φ 1.

artificial neural network bias regularization constant cross-industry standard process for data mining classification and regression tree data mining frequency response function feature space activation function natural frequency measure of impurity of node t imperial competitive algorithm hybrid imperial competitive algorithm and artificial neural network loss function multi-layer perceptron mean absolute error number of samples nonlinear transformation node proportions support vector machine structural health monitoring output data input data connection weight values relative percentage error natural frequencies of the modal testing natural frequencies of the numerical simulation coefficient of imperialist competitive algorithm radius of the tube non-negative slack variables nonnegative function

Introduction Civil infrastructures, such as tall buildings, long span bridges, truss structures and large

hydraulic structures are usually subjected to damage induced by different causes such as fatigue, aging, environmental effects, overloading, delamination, creep, corrosion, fracture, earthquake, wind excitations or blast loads during their service life which can critically disturb their integrity, 2

serviceability and safety [1–6]. In order to detect such damages, structural health monitoring (SHM) techniques are frequently conducted as useful tools [7–9]. In this line, classical methods such as visual inspection, chain drag, Acoustic Emission (AE), Impact Echo (IE), X-ray, Impulse Response (IR), radiography, eddy current techniques, thermal field methods, dye penetrant, and ultrasonic methods are common damage detection techniques. However, these methods have many limitations. For example, they usually require prior knowledge of the damage location. They are also time consuming, costly and ineffective for large and complex structural systems [10]. Consequently, more precise and quicker methods need to be applied to monitor the health condition of structures. It also should be noted that, the monitoring process in SHM creates many data; therefore, valuable knowledge must be extracted from raw databases. One of the new approaches in SHM consists of two major components, i.e. a network of sensors to collect the response data and data mining (DM) to extract information on the structural health condition [11]. Data mining (DM) is one of the new emerging computing tools which has been rapidly embraced by civil engineers [12,13]. It is the process of analyzing observational datasets to obtain the understandable and useful relationship between data in the form of models or patterns and finally making predictions using models [14]. Hence, DM can play a significant role in the creation of hidden patterns (knowledge) in large databases from raw data, especially for complex and time consuming problems which cannot be solved with traditional statistical techniques [15]. A systematic tool is essential to obtain satisfactory results for DM analysis in SHM [8]. In this direction, several DM tools exist, i.e. DMAIC (Define-Measure-Analyze-Improve-Control), Knowledge Discovery in Databases (KDD), SEMMA (Sample-Explore-Modify-Model-Assess), 3

Cross-Industry Standard Process for DM (CRISP-DM), etc. [16–21]. Amongst all the tools, the most common data mining model is CRISP-DM [22]. Furthermore, prediction is one of the most popular functions in DM. It was used through different algorithms such as artificial neural network [23], fuzzy system [24], support vector machine [25], regression analysis [26], principal component analysis [27], Bayesian analysis [28], decision tree [29], genetic algorithm [30], particle swarm optimization [31] and ant colony optimization [32] in SHM to detect damage in structural systems. In recent years, ANNs have gained significant interest for damage identification of structures due to their pattern recognition capacity [33]. Decision tree also can predict the relationship between objects by one or more variables. Not only its outcomes are easy to use and understand, but also it has higher interpretable capacity to compare with other statistical techniques. Decision tree has several algorithms such as C5.0 which is based on ID3 (Iterative Dichotomiser 3), classification and regression trees (CART), and chi-squared automatic interaction detector (CHAID). Both CART and CHAID use a rule set to classify the results for an input dataset. However, data preparation process in CART is faster than CHAID. Moreover, SVM is one of the machine learning algorithms which has been applied in data classification and pattern recognition in numerous tasks due to its high quality predictions. This research investigates the applicability of data mining for development of damage identification of civil structures. For implementation of data mining in structural damage identification, a systematic model and applicable data mining methods are required. Therefore, CRISP-DM model, which provides a framework for data mining project is applied on damage detection process in six steps. To this end, the first step starts with experimental modal analysis. 4

Therefore, several damage scenarios are created and the required data are obtained in the form of natural frequencies of the first four modes for the intact and damaged models. After collecting the sensor data, preparation step generates the final database in training and testing sets as an input for modeling step. Later on, the applicable algorithms including SVM, ANN and CART are examined to predict the model behavior and damage severity. Consequently, in the evaluation step, the performance of the models is compared by means of Mean Absolute Error (MAE) and their correlations. Afterwards, a hybrid ICA-ANN algorithm is proposed as a deployment of this study. Besides, because of the absence of particular data mining framework in SHM along with demanding needs to develop the fault diagnosis of structures, an architecture based on data mining is proposed for damage identification of structures. 2.

Experimental Modal Analysis

Mode identification methods can be divided into operational and experimental modal analysis. Operational modal analysis relies only on output measurements whereas experimental modal analysis uses input excitation and output response measurements to estimate modal parameters [34–36]. Experimental modal analysis can be conducted using two different methods, namely ambient and forced vibration. In forced vibration method, the input force is controlled, whereas, in ambient vibration method, the input force cannot be controlled. In most cases of vibration tests, the input excitation and output response are measured according to time domain approach, however, it is difficult to study damage identification in such manner. Hereupon, time domain data can be transformed to frequency domain using modal analysis and modal domain data can be extracted from frequency domain. Consequently, the modal domain methods play a 5

significant role in structural damage identification and their popularity is much more than time domain or frequency domain approach. This is due to the fact that, the modal properties such as natural frequencies, modal damping and mode shapes have their physical meaning and they are easier to interpret in compare to mathematical features obtained from time or frequency domain [37–39]. Structural modal parameters are always identified based on frequency domain methods in practical engineering [40]. Therefore, in recent years, damage detection by measuring natural frequency changes has been used to investigate various structures to solve forward and inverse problems [41–45]. For example, Qu et al. [46] presented a step-wise procedure for frequency identification of practical bridges. In this research, the proposed frequency identification procedures were effectively applied in reducing the influence of random environment noise based on higher-order spectrum theory. An improved stabilization diagram was also presented by Qu et al. [47] in order to denote the natural frequencies and the corresponding modal parameters, i.e. damping ratios and mode shapes of a practical cable‐stayed bridge structure. Besides, a criterion called modal response contribution index (MRCI) was proposed in this study to reduce the influence of spurious modes for the eigensystem realization algorithm. It was concluded that the modified stabilization diagram and the MRCI could successfully identify the modal parameters. Then, the proposed stabilization diagram was employed effectively in eigensystem realization algorithm to distinguish the spurious modes in [48].

6

3.

Data Mining

Data mining (DM) is the process of identifying valuable information from different databases, which can handle the qualitative analysis of the complex real world behavior. The knowledge obtained from raw data by data mining process is mostly characterized as models or patterns and it is very significant that this knowledge must be valid, authentic, innovative and understandable with meaningful relationships from large quantities of data [49]. Generally, DM can be categorized into different classes, i.e. predictive mining, descriptive mining, prescriptive mining, and hybrid models. Each of these categories contains specific functionalities and various techniques to run. In Predictive DM, the main key is to forecast a continuous value. However, finding the relevant features with frequent occurring in a group is the most significant task of descriptive DM. Focus of the prescriptive DM is to use optimization and simulation algorithms to find the best outcome. Hybrid DM techniques combine different algorithms in order to construct more reliable and less sensitive models [50]. The CRISP-DM as the most common used DM model was established by a group consists of National Cash Register (NCR) System Engineering Copenhagen from USA and Denmark, Daimler Chrysler AG from Germany, Integral Solutions Ltd. (ISL)/SPSS from USA and an insurance corporation in Netherlands, called OHRA [19]. The main phases of data mining process using CRISP-DM model, which are business understanding, data understanding, data preparation, modeling, evaluation and deployment.

7

4.

Methodology

With development of data inverse analysis and in cope with unknown input problems, DM should be combined with SHM to improve the application of obtained data in complicated structural systems. Therefore, in this study, natural frequencies of first four modes of a simplysupported slab-on-girder bridge structure through an experimental modal test were conducted as an input database for data mining. For purpose of data mining analysis, a systematic model and applicable data mining methods are required. By taking advantage of CRISP-DM, the present study proposes a generalized form of this tool based on the SHM system, as shown in Figure 1. Based on this model, the first step describes the specimen and experimental setup of the dynamic testing through experimental modal analysis. In the second step, operational data are obtained from the vibration test of intact and damaged structures through frequency response function (FRF) measurement to collect the database. Then, data preparation preprocesses the measured data through modal parameters to generate a database as an input for the next step, which is modeling. In this step, applicable algorithms through machine learning, artificial intelligence and statistical methods are used to train the database and generate the test design for damage identification of the structure. In this line, SVM, ANN and CART represent machine learning, artificial intelligence and statistical methods, respectively. Subsequently, in the fifth step, the performances of the patterns are evaluated in order to ensure the accuracy of results and find the best pattern through Mean Absolute Error (MAE) and their correlations. Finally, the last step extracts valuable knowledge from data mining process through knowledge discovery/Damage severity. To this end, a hybrid ICA-ANN algorithm and a fault diagnosis scheme are proposed in the deployment step. 8

Fig. 1. Flowchart of the proposed damage identification method

9

4.1. Target Identification In this step, the focus is on understanding the project objectives and convert it into a data mining problem definition. A preliminary project plan is then proposed to achieve the objectives. To aid the aim, the main objective of the present study is to detect damage in a slab-on-girder bridge structure. This structure consisted of three steel I-beams attached to a slab through shear stud connectors and it was used as a benchmark structure for experimental modal analysis (see Figure 2(a)). The length of the slab specimen was 3200 mm, while the supports were set at 100 mm distance from the slab ends. Other dimensions of the slab include the width of 1200 mm and the depth of 100 mm. The dimensions of the beams include the flange width of 75 mm, section depth of 150 mm and thickness of 7 mm and 5 mm for the flange and web, respectively. For steel materials, Young’s Modulus of 2.1*1010 kg/m2, Poisson’s ratio of 0.3 and density of 7,850 kg/m3 were used, whereas for concrete materials the density and strength of 2400 kg/m3 and 37.43 MPa were used, respectively. A mesh reinforcement panel with 5 mm diameter and 100 mm spacing was used to reinforce the concrete slab. Sixteen shear studs connected the I-beams to the slab. The diameter of each stud is 16 mm with 200 mm c/c spacing and height of 75 mm. The experimental dynamic test setup in the laboratory is shown in Figure 2(b).

10

(b)

(a)

(c)

Fig. 2. Vibration test procedure: (a) schematic view and dimensions of the slab-on-girder bridge, (b) experimental setup of the test and (c) the arrangement of the accelerometers and the excitation point Modal testing of the slab-on-girder bridge structures was started by means of a healthy structure to extract the flexural modes. The positions of accelerometers were considered in the centerline of the steel beams between the supports in 48 points, as illustrated in Figure 2(c). In addition, it can be seen from the figure that the location of the shaker was at point No. 19. To

11

select the excitation point, it is necessary to choose a point so that there must not be a node point for at least first four modes. Therefore, the point No. 19 was selected as the excitation point. 4.2.

Modal Analysis and Data Collection This step started with collecting the data. In present study, the experimental modal analysis

of a slab-on-girder bridge was carried out to collect the first four natural frequencies identified from the frequency response function (FRF) measurements with various predetermined damage scenarios as datasets for data mining process. To aid the aim, input analogue signals of sensors and shaker were amplified in the time domain. The signals were converted to the output digital signals by means of a data acquisition system, named OROS. Modal testing parameters of the software platform (NVGate) are indicated in Table 1. Table 1. Experimental modal analysis parameters in NVGate Parameters Sampling rate Frequency bandwidth No. of FRFs Frequency resolution

Value 5.12 kS/s 2500 Hz 6401 points 0.39 Hz

In the first step, modal testing was performed for undamaged state as the reference or benchmark model. Then, the first four flexural modes were extracted using ICATS software. Later on, single and multiple type damage scenarios were created by introducing damage severity at three damage locations (e.g. L/4, L/2, and 3L/4) of the beams as indicated in Figure 3. It can be seen from Figure 3(a-b) that, in single-type damage state, beam 1 experienced the damage exactly at its mid-span, whereas, in the multiple-type damage state, the damage of beams 1 and 3 occurred at location of L/4 and 3L/4 of the span length, respectively. Alongside these, a 12

total of 25 damage severities were considered with 5 mm width, from 3 mm to 75 mm depth with increment of 3 mm for each damage scenario, as illustrated in Figure 3(c). The result of the experimental modal analysis was used as datasets for DM process.

Fig. 3. Damage scenarios of the slab-on-girder bridge structure including (a) single-type damage state, (b) multiple-type damage state, and (c) damage depth of beams 4.3.

Database Construction This step includes all processes to build the last dataset which would be fed into the modeling

techniques from the primary raw data. Herein, the stages and tasks in this step consisted of the data selection including attribute selection and tabulate the data, data cleaning, data construction, data integration, data formatting and data transformation for modeling step. Based on this step and according to the detected damage from the previous step, natural frequencies of intact state and damaged states at three different locations are taken and shown through Table 2 and plotted in Figures 4 and 5, accordingly. Natural frequencies of first four undamaged model are listed in Table 2. This table indicates that, although all models were 13

identical; their modal parameters had different obtained values. Figures 4 and 5 also show the reduction trend of natural frequencies with damage expansions in the examined model. According to Figure 4, second and forth modes have the lowest reduction. This is because that, damage was inflicted at mid-span. Therefore, these modes were considered as node points in the structure and only minor changes were observed in these two modes. Further, based on Figure 5, which also belonged to multiple damage state, natural frequencies of first three modes had clear reduction. Nevertheless, as damage locations (L/4 and 3L/4 of the span length) in beams 1 and 3 are the node point for the fourth mode, therefore, this mode only experienced slight changes of natural frequency. After applying a number of damage severities, a minor fluctuation of the natural frequencies was observed at 36 mm damage state for the second and forth modes of single-type and multipletype damage scenarios, respectively, which was not related to modal testing. This variation might be caused by some unwanted environmental vibrations and disturbing noise. Table 2. Experimental natural frequencies of the first four modes at undamaged state Damage scenario

Damage location

Single-type damage Multiple-type damage

L/2 of beam 1 L/4 of beam 1 and 3L/4 of beam 3

Natural Frequencies (Hz) of first four modes F1 F2 F3 F4 31.77 248.30 391.28 565.45 31.30 258.87 389.72 559.19

14

(a)

(b)

(c)

(d)

Fig. 4. First four natural frequencies of single-type damage at (a) mode 1, (b) mode 2, (c) mode 3, and (d) mode 4

15

Fig. 5. First four natural frequencies of multiple-type damage state Data type is “continuous” and five fields are considered to implement the data analysis in IBM SPSS modeler (e.g. four natural frequencies as inputs and severity as a target). Statistics of natural frequencies are shown in Table 3. Dataset is divided into two partitions including training and testing partitions with the size of 70% and 30%, respectively. Table 3. Statistical parameters of natural frequencies Statistical parameters Min Max Mean Standard deviation

Natural frequencies F1 F2 F3 F4 30.42 244.99 373.51 553.89 31.77 258.87 391.28 565.45 31.08 249.26 381.21 558.08 0.41 3.62 4.59 2.98

16

4.4.

Modeling In this stage, several DM techniques can be applied to databases. Steps of this phase include

DM selection technique, generation of test design, creation and assessment of models. Generally, several algorithms in DM methods exist for the same problem type and they can be selected for different purposes such as clustering, prediction, classification, exploration and association [51]. The statement of the applied methods is presented in subsequent writing. 4.4.1. Support Vector Machine (SVM) Vapnik [52] introduced SVM which is a supervised learning method. SVM is generated by input–output mapping function from a labeled training dataset. This function solves both classification and regression problems. In other words, the SVMs classify data with different class labels by determining a set of support vectors that are members of a training input set that represents a hyper plane in a feature space. A kernel function of a generic mechanism is used to fit the hyper plane surface to training data. A linear model is then constructed in this feature space f(x,ω) as expressed by Eq. (1): 𝑚

𝑓(𝑥,𝜔) =

∑𝑤 𝑔 (𝑥) + 𝑏 𝑗 𝑖

(1)

𝑗=1

where wj, j=1,…,m is the weight vector, gj(x) is a set of nonlinear transformations and b is a bias term. Estimation quality is measured by a loss function L(x,ф) in which:

{

0 𝑖𝑓 |𝑦 ― 𝑓(𝑥,𝜔)| ≤ 𝜀 𝐿𝜀 = [𝑦,𝑓(𝑥,𝜔)] = |𝑦 ― 𝑓(𝑥,𝜔)| 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(2)

where ɛ represents the radius of the tube located around f(x,ω) and y is the output. 17

The weight vector and bias of SVM can be optimized by minimizing the following function:

1 𝑚𝑖𝑛 ‖𝜔‖2 + 𝑐 2

𝑛

∑(𝜉 + 𝜉

∗ 𝑖 )

𝑖

(3)

𝑖=1

where ξi and ξi*, i=1,…,n are non-negative slack variables and c represents a regularization constant (c ≥ 0). 4.4.2. Artificial Neural Network (ANN) Artificial Neural Network (ANN) algorithms are based on development of biological neurons to formulate artificial neurons models and they can be used when the connection of input and output is complex or when the computational time is too much. Principal components of the artificial neuron include connection weights (wij), bias (bi) and activation function (fi). Besides, xi represents the inputs and Yi indicates the outputs which can be defined as follows: 𝑛

𝑌𝑖 = 𝑓(

∑𝑤 𝑥 + 𝑏) 𝑖 𝑖

(4)

𝑖=1

A conventional neural network comprises of the input layer, the hidden layer and the output layer. The neurons in the input layer denote the value of independent variables. Each neuron in the hidden layer is just for computation target and the neurons in the output layer are responsible for calculation of dependent variables. In general, signals are received at the input layer, pass through the hidden layer with their varying connection strengths (weights), and reach the output layer. To this end, neural networks can be categorized by their network topology such as feedforward and feedback. Initially, all weights are random and the network learns by training the signals. Then, the ANN makes some changes to the weights repeatedly until the predicted 18

outputs become gradually precise in replicating the actual measured data. Amongst different ANNs, the Multi-Layer Perceptron (MLP) is the most popular algorithm in structural identification problems. In the same line, the multi-layer feedforward neural network and the back-propagation neural network are commonly used, but the back-propagation neural network is one of the best-developed algorithms that can train multi-layer perceptron networks. 4.4.3. Classification and Regression Tree (CART) The CART is a decision tree method for constructing a classification or regression tree according to its dependent variable type, which may be categorical or numerical. Its results are not only easily understandable, but also support human decision making in tree induction. The CART analysis is critical for prediction problems. When the target variable is discrete valued, a classification tree is developed, whereas a regression tree is developed for the continuous target variable. CART model can be developed in three steps including tree growing, split rules creation, and Gini reduction, respectively. Minimizing the “impurity” of two child nodes is the main task of tree growing. Then, split rules should be generated to have more homogeneous child node than parent node. Finally, all splits are evaluated based on Gini reduction criteria to maximize the homogeneity. The impurity of a node for a classification tree can be defined as: 𝑖(𝑡) = Φ(𝑝(1|𝑡), 𝑝(2|𝑡),…,𝑝(𝑗|𝑡))

(5)

where i(t) is a measure of impurity of node t, p(j|t) is the node proportions (i.e. the cases in node t belonging to class j), and Φ is a nonnegative function [53].

19

4.4.4. ANN-Based Imperial Competitive Algorithm (ICA-ANN) Imperial Competitive Algorithm (ICA) is one of the latest meta-heuristic algorithms. This evolutionary algorithm is based on the human being’s sociopolitical evolution and the imperialistic competition. Hence, the process starts with initial population, called countries, which divided in two categories, i.e. colony and imperialist. These two groups create an empire and the imperialists are countries with the least cost. Thus, each empire can occupy some colonies. In this regard, each empire attempts to control the colonies of other empire. Consequently, the colonies move towards proper imperialists and the weaker empires become smaller. The countries are converged to the global minimum in powerful empires. At the end, employing a termination criterion will end the iterative process. ICA is applied in the training process of ANN to initialize the weights of the network. ICA aims to find the global maximum or minimum of a fitness function. This optimization technique has shown its high performance to achieve high global optima with fast convergence speed in compare to other meta-heuristic algorithms [54]. Therefore, ICA has also been applied into ANN method to determine the parameters of network structure [55]. The flowchart of the proposed hybrid algorithm is shown in Figure 6.

20

Fig. 6. Flowchart of the proposed algorithm in deployment step of the DM process 5. Results and Discussions Numerical simulation of the test specimen was implemented in order to verify the reliability and accuracy of the experimental work using finite element package, ABAQUS. In the 21

simulation process, models were precisely designed as per the test structure. For this purpose, a 4-node shell homogeneous (S4R) and a 8-node solid homogeneous linear brick (C3D8R) were used for beams and girder deck, respectively. For numerical analysis it was also decided to model three damage scenarios including all locations (i.e., undamaged, single-type and multipletype damage) considering 24 mm, 48 mm and 75 mm damage depth, individually. Numerical natural frequencies of first four undamaged model are listed in Table 4. For the sake of clarity, the first four mode shapes of three support beams in multiple damage scenario with 75 mm damage depth are displayed in Figure 7. As it can be seen from the figure, the shape of entire modes changed due to the enhancement of damage severity. Table 4. Numerical natural frequencies of undamaged state 1st mode (Hz) 33.01

(a) 1st mode: 31.37 Hz

2nd mode (Hz) 256.32

(b) 2nd mode: 245.90 Hz

3rd mode (Hz) 391.29

4th mode (Hz) 554.69

(c) 3rd mode: 379.49 Hz

(d) 4th mode: 552.59 Hz

Fig. 7. Numerical damaged simulation of the specimen for 75 mm damage in multiple-type damage scenario A relative percentage error (ωrpe) was presented to verify the relationship between finite element modeling and experimental work of the tested structure, as given in Eq. (6).

22

𝜔𝑟𝑝𝑒 =

|𝜔𝑒𝑥𝑝 ― 𝜔𝑛𝑢𝑚|

(6)

𝜔𝑒𝑥𝑝

where ωexp and ωnum represent the natural frequencies of the modal testing and numerical simulation, respectively. Figure 8 shows the comparison of the laboratory and numerical findings in the first four modes of single-type and multiple-type models. According to this figure, the relative percentage error was less than 2% for third and fourth modes and less than 5% for first and second modes, respectively. From this figure, it also can be observed that the obtained results from the first four natural frequencies in the numerical analysis were well-agreed with the experiments and proved the validity of the study.

(a)

(b)

(c)

(d)

Fig. 8. Comparison of the experimental and numerical results of single and multiple damage scenarios in the first four modes (a) mode 1, (b) mode 2, (c) mode 3, and (d) mode 23

Before going to the details of the models, it is important to highlight terms “Gain” and “predictor importance” clearly. Gain is a measure which indicates the effectiveness of the model by means of interpreting of cumulative values starts at 0% and ends to 100%. Likewise, a Gain chart is a convenient way of visualizing the predictors. In general, the modeling efforts focus on prediction with most consider dropping fields or least ignoring disciplines. To this end, the predictor importance is considered to show the relative reliability values of each predictor in estimating the model with the sum of 100% for all predictors. In this study, MLP was employed in the neural network model, since its application is the most often used ANN model in SHM. Therefore, as it can be observed in Figure 9(a), MLP provided a great fitness in training and testing sets. SVM model was also carried out using four different kernel functions including RBF, Polynomial, Sigmoid and Linear. As shown in Figure 9(b), amongst all functions, polynomial type achieved the most accurate outputs. Furthermore, a standard model with a single tree was generated in the CART model using maximum five tree depth to explain the relationship between variables. This is due to the fact that standard CART models are faster to score than boosted, bagged, or ensemble models. Since the tree was pruned in CART model to avoid overfitting, this model also presented acceptable accuracy (see Figure 9(c)). This is because of the standard deviation maximum difference was set to 1. Besides, the Gini index was used to measure the impurity during the data analysis in CART. As it can be observed from Figure 10, the most important predictor was the third natural frequency (F3) among all inputs. According to this figure, the importance of F3 was around 35%, 70% and 30% for SVM, ANN and CART models, respectively.

24

(a)

(b)

(c) Fig. 9. Comparison between models: (a) ANN, (b) SVM, and (c) CART

25

Predictor Importance (%)

SVM

ANN

CART

70 60 50 40 30 20 10 0 F1

F2 F3 Natural frequency

F4

Fig. 10. Predictor importance in SVM, ANN, and CART models Figure 11 illustrates the comparison between predicted and real measured values in the first four flexural modes in three different DM models used in the present study (e.g. CART, ANN, and SVM) to detect the damage severity of the tested model. As indicated in the figure, the fitness of predicted values by ANN was better than other patterns. SVM method has also more reliable capability of prediction in compare to the statistical method. In contrast, CART had the lowest accuracy of prediction fitness amongst all the patterns due to its stepwise discrete output as well as less smooth continuous score.

26

Actual

CART

ANN

SVM

Damage Severity (mm)

75 60 45 30 15 0 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Damage Scenario Fig. 11. Comparison of actual and estimated damage severities in different models 5.1.

Evaluation This step is used to assess the constructions of the models, which were built in the modeling

step, to ensure that they correctly achieved the business objectives. Performance of the models should be measured by how closely the predicted values matched the actual values. Therefore, model performance should be determined by calculating the deviations of the predicted values from the actual values, namely the prediction errors. Different evaluation methods can be utilized for this purpose. In this study, the prediction performance is evaluated by Mean Absolute Error (MAE). The MAE is derived as follows:

𝑛

𝑀𝐴𝐸 =

∑𝑖 = 1|𝑎𝑐𝑡𝑢𝑎𝑙𝑖 ― 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑖|

(7)

𝑛

27

Modeling performance of each model in training and texting partitions is shown in Table 5. According to this table, the most successful MAE values are 1.355 and 2.097 for training and testing, respectively. Results from three models indicated that ANN with higher correlations and less MAE had the best performance amongst all models. Although, CART had the slightly better performance in the training part in comparison to SVM, but SVM can be recommended due to its better general performance. Comparison of the obtained results indicated that ANN is the best acceptable technique for damage severity of test specimens. Table 5. Summary of modeling performance MAE Correlation Training Testing Training Testing SVM 5.056 4.925 0.969 0.984 ANN 1.355 2.097 0.997 0.994 CART 4.706 7.200 0.963 0.883

Model

5.2.

Deployment Creation of models cannot be considered as the end of the project. The deployment step must

be carried out from obtained knowledge which should be organized in an understandable way to use for the future plan, though; it can be very simple or very complex. For instance, depending on the requirements, a simple report can be sufficient for this phase or a repeatable DM process must be implemented to present the obtained knowledge for future use. Steps of this phase include plan deployment, plan monitoring and maintenance, the production of the final report, and review of the project. In this regard, based on the presented results of this study, a design scheme of structural fault diagnosis was proposed, as shown in Figure 12. Based on this framework, first phase of fault diagnosis is to measure the existing damage level and then collect the sensor data. Subsequently, data preparation phase, as one of the most important components 28

of the entire procedure, is required to construct the database. This phase is also the most time consuming and challenging part of the proposed process due to a number of problems (i.e. missing values, incomplete data, wrong data type, unavailable details, out of range records, etc.). Therefore, preprocessing step transforms the data as input for modeling phase. In the modeling phase, specific algorithm will be selected to train the data and create the model. After model assessment, the health condition of structures should be improved via a number of treatments such as upgrading or repairing the structural components, replacement of the damaged parts or major/minor maintenance.

Fig. 12. Architecture of the proposed data mining procedure Based on the evaluation step, ANN provided the best performance amongst all models. However, any ANN architecture has different features for training due to different factors such 29

as topology of the network, the initial weights of the network and type of network function. In the same line, the training process of the pre-developed network keeps updating the weights up to the satisfactory level. On the other hand, the pre-developed ANN has several drawbacks, i.e. over-fitting and inefficient optimal topologies, which can reduce the accuracy of the network. Therefore, a hybrid ICA-ANN algorithm combining ANN and imperial competitive algorithm (ICA) was performed to illustrate a new approach in the deployment step of the proposed model. In this regard, ICA was applied in the training process of ANN to initialize the weights of the network. The main aim was to minimize the network error, so-called cost function, by optimizing the weights of the network. For this purpose, an ICA-ANN network corresponding to the first mode of multiple-type damage scenario was modeled using the same measured natural frequencies and all the mode shapes data. Accordingly, a dataset consists of 15 neurons in the input layer (the first natural frequency and 14 mode shapes) and one neuron in the output layer (damage severity) was created. Similar to the previous modeling, the database was randomly divided into training and testing sets. In order to compute the feedforward back propagation algorithm, the number of imperialists set to 15, the number of counties set to 100 and coefficient β set to 2. Figure 13 demonstrates the comparison between predicted results and actual values in the first flexural mode at training and validation phases. As it can be seen from the figure, the red circles represent the real measured data and the blue squares represent the predicted results fitted to the real measured data. In addition, this figure indicates a decent fitness for the obtained results. Besides, the best cost of the hybrid network is 0.0043 and MAE values are 0.057 and 0.075 for

30

training and testing, respectively (see Table 6). The table and figure clearly depict that, ICAANN provides high performance and accurate detection in compare to the pre-developed ANN.

(a)

(b) Fig. 13. Results in the first mode of multiple damage scenario: (a) ANN, and (b) ICA-ANN

31

Table 6. MAE performance of patterns Set Training Testing

ICA-ANN 0.057 0.075

ANN 1.355 2.097

6. Conclusion Structural health monitoring techniques can guarantee the integrity, serviceability and safety of the civil infrastructures by providing reliable and economical approaches to identify damage of the structures in order to maintain and repair. Damage identification of structures has been categorized into four levels including the presence of damage in the structure, location of the damage, severity of the damage and predicting the remaining service life of the structure. A data mining –based damage identification approach has been developed in this study for damage detection purpose, where the third level of damage identification, i.e., damage severity of the slab-on-girder bridge structure, has been predicted through DM process by implementation of CRISP-DM model. To this end, the experimental modal analysis was conducted in the first phase. Then, sensor data have been utilized as inputs for DM process. In the modeling phase of DM process, three different model techniques, i.e., SVM, ANN, and CART which are machine learning, artificial intelligence and statistical algorithms, respectively, have been used to estimate the damage severity of the structure through MAE and correlations to investigate the performance of the models. As a result, a hybrid algorithm was proposed in the deployment step of CRISP-DM. Based on the obtained results, several concluding remarks can be drawn as follows.

32



Pre-developed ANN method gave the best performance with the most successful MAE values which were 1.355 and 2.097 for training and testing, in compare to SVM and CART. This is due to the fact that, ANN has higher flexibility, autonomy and accuracy.



CART presented the lowest performance with MAE of 4.706 and 7.2 in training and testing partitions which was motivated by the lack of capacity, complexity and flexibility.



It is concluded that, the artificial intelligence method had the highest accuracy. Likewise, in the second stage, the performance of machine learning technique was better than statistical method.



In all three models, the third natural frequency was the most important predictor among all inputs which was around 35%, 70% and 30% for SVM, ANN and CART models, respectively. This is due to the location of damage as well as selected excitation point.



Based on the comparison of artificial intelligence methods, ICA-ANN improved the prediction error which indicated the robustness of the hybrid network in compare to the pre-developed network.



Besides, due to the lack of particular straightforward data mining framework in SHM along with demanding needs to develop the fault diagnosis of structures, an architecture based on data mining was proposed for SHM assessment of structures.

Acknowledgements This research was funded by University of Malaya (UM) and the Ministry of Higher Education (MOHE), Malaysia [grant numbers IIRG007A and PG144-2016A]. Accordingly, the

33

authors would like to express their sincere thanks to Advance Shock and Vibration Research Group (ASVR), UM and MOHE. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]

K. Ghaedi, M. Jameel, Z. Ibrahim, P. Khanzaei, Seismic Analysis of Roller Compacted Concrete ( RCC ) Dams Considering Effect of Sizes and Shapes of Galleries, KSCE J. Civ. Eng. 20 (2016) 261–272. doi:10.1007/s12205-015-0538-2. K. Ghaedi, Z. Ibrahim, A. Javanmardi, M. Jameel, U. Hanif, S.K. Rehman, M. Gordan, Finite Element Analysis of a Strengthened Beam Deliberating Elastically Isotropic and Orthotropic Cfrp Material, J. Civ. Eng. Sci. Technol. 9 (2018) 117–126. K. Ghaedi, Z. Ibrahim, H. Adeli, Invited Review: Recent developments in vibration control of building and bridge structures, J. Vibroengineering. 19 (2017) 3564 3580. M.U. Hanif, Z. Ibrahim, M. Jameel, K. Ghaedi, M. Aslam, A new approach to estimate damage in concrete beams using non-linearity, Constr. Build. Mater. 124 (2016) 1081–1089. K. Ghaedi, F. Hejazi, Z. Ibrahim, P. Khanzaei, Flexible Foundation Effect on Seismic Analysis of Roller Compacted Concrete (RCC) Dams Using Finite Element Method, KSCE J. Civ. Eng. 22 (2018) 1275–1287. doi:10.1007/s12205-017-1088-6. H.H. Ghayeb, H.A. Razak, N.H.R. Sulong, Development and testing of hybrid precast concrete beam-to-column connections under cyclic loading, Constr. Build. Mater. 151 (2017) 258–278. doi:10.1016/j.conbuildmat.2017.06.073. M.U. Hanif, Z. Ibrahim, K. Ghaedi, H. Hashim, A. Javanmardi, Damage Assessment of Reinforced Concrete Structures using a Model-based Nonlinear Approach - A Comprehensive Review, Constr. Build. Mater. 192 (2018) 846–865. doi:10.1016/J.CONBUILDMAT.2018.10.115. M. Gordan, H.A. Razak, Z. Ismail, K. Ghaedi, Recent developments in damage identification of structures using data mining, Lat. Am. J. Solids Struct. 14 (2017) 2373–2401. doi:10.1590/167978254378. A. Entezami, H. Shariatmadar, Structural health monitoring by a new hybrid feature extraction and dynamic time warping methods under ambient vibration and non-stationary signals, Measurement. 134 (2019) 548–568. doi:10.1016/j.measurement.2018.10.095. S. Kashif Ur Rehman, Z. Ibrahim, S.A. Memon, M. Jameel, Nondestructive test methods for concrete bridges: A review, Constr. Build. Mater. 107 (2016) 58–86. doi:10.1016/j.conbuildmat.2015.12.011. Z. Hou, A. Hera, M. Noori, Wavelet-Based Techniques for Structural Health Monitoring, in: Heal. Assess. Eng. Struct. Bridg. Build. Other Infrastructures, World Scientific, 2013: pp. 179–202. S. Saitta, B. Raphael, I.F.C. Smith, Data Mining : applications in civil engineering, VDM, Saarbrücken, 2009. J. Chou, C. Chiu, M. Farfoura, I. Al-taharwa, Optimizing the Prediction Accuracy of Concrete Compressive Strength Based on a Comparison of Data-Mining Techniques, J. Comput. Civ. Eng. 25 (2011) 242–253. doi:10.1061/(ASCE)CP.1943-5487. C.A. Edeki, A Comparative Study of Data Mining and Statistical Learning Techniques for Prediction of Canser Survovability, Capella University, 2012. D.W.M. Hofmann, J. Apostolakis, Crystal structure prediction by data mining, J. Mol. Struct. 647 (2003) 17–39. A. Azevedo, M.F. Santos, KDD, SEMMA AND CRISP-DM:A Parallel Overview, in: IADIS Eur.

34

[17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37]

Conf. Data Min., IADIS, 2008: pp. 182–185. S.S. Anand, A.G. Büchner, Decision Support Using Data Mining, Financial Times Management, London, 1998. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, Knowledge Discovery and Data Mining : Towards a Unifying Framework, KDD. 96 (1996) 82–88. P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, R. Wirth, CRISP-DM 1.0 Step-by-step data mininng guide, 2000. L.A. Kurgan, P. Musilek, A survey of Knowledge Discovery and Data Mining process models, Knowl. Eng. Rev. 21 (2006) 1–24. S.C. Chen, M.Y. Huang, Constructing credit auditing and control & management model with data mining technique, Expert Syst. Appl. 38 (2011) 5359–5365. M. Saltan, S. Terzi, E. Ug, Backcalculation of pavement layer moduli and Poisson’s ratio using data mining, Expert Syst. Appl. 38 (2011) 2600–2608. S. Kabir, P. Rivard, G. Ballivy, Neural-network-based damage classification of bridge infrastructure using texture analysis, Can. J. Civ. Eng. 35 (2008) 258–267. K. Aydin, O. Kisi, Applicability of a Fuzzy Genetic System for Crack Diagnosis in Timoshenko Beams, J. Comput. Civ. Eng. 29 (2014) 04014073. H.-X. He, W. Yan, Structural damage detection with wavelet support vector machine: introduction and applications, Struct. Control Heal. Monit. 14 (2007) 162–176. I. Laory, T.N. Trinh, D. Posenato, I.F.C. Smith, Combined Model-Free Data-Interpretation Methodologies for Damage Detection during Continuous Monitoring of Structures, J. Comput. Civ. Eng. 27 (2013) 657–666. T. Hsu, C. Loh, Damage detection accommodating nonlinear environmental effects by nonlinear principal component analysis, Struct. Control Heal. Monit. 17 (2010) 338–354. X. Jiang, S. Mahadevan, Bayesian Probabilistic Inference for Nonparametric Damage Detection of Structures, J. Eng. Mech. 134 (2008) 820–832. S. Kim, D.M. Frangopol, B. Zhu, Probabilistic Optimum Inspection / Repair Planning to Extend Lifetime of Deteriorating Structures, J. Perform. Constr. Facil. 25 (2011) 534–545. G. Rus, S.Y. Lee, S.Y. Chang, S.C. Wooh, Optimized damage detection of steel plates from noisy impact test, Int. J. Numer. Methods Eng. 68 (2006) 707–727. Z. Tabrizian, E. Afshari, G. Ghodrati, M. Hossein, A. Beigy, S.M. Pourhoseini Nejad, A new damage detection method : Big Bang-Big Crunch ( BB-BC ) algorithm, Shock Vib. 20 (2013) 633–648. G. Cottone, G. Fileccia Scimemi, A. Pirrotta, α-stable distributions for better performance of ACO in detecting damage on not well spaced frequency systems, Probabilistic Eng. Mech. 35 (2014) 29–36. M. Gordan, H.A. Razak, Z. Ismail, K. Ghaedi, Data mining based damage identification using imperialist competitive algorithm and artificial neural network, Lat. Am. J. Solids Struct. 15 (2018) 1–14. C. Qu, T. Yi, H. Li, Mode identification by eigensystem realization algorithm through virtual frequency response function, Struct. Control Heal. Monit. (2019) 1–19. doi:10.1002/stc.2429. C.-X. Qu, T.-H. Yi, H.-N. Li, Modal identification for superstructure using virtual impulse response, Adv. Struct. Eng. (2019) 1–9. doi:10.1177/1369433219862951. H. Monajemi, MODEL UPDATING AND DAMAGE DETECTION OF FRAME STRUCTURES USING OUTPUT-ONLY MEASUREMENTS, PhD Thesis, University of Malaya, 2018. J. Hou, L. Jankowski, J. Ou, Frequency-Domain Substructure Isolation for Local Damage Identification, Adv. Struct. Eng. 18 (2015) 137–154.

35

[38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55]

M. Gordan, Z. Ismail, H.A. Razak, Z. Ibrahim, Vibration - Based Structural Damage Identification Using Data Mining, in: 24th Int. Congr. Sound Vib. London, 2017. A.G.A. Rahman, Z.C. Ong, Z. Ismail, Enhancement of coherence functions using time signals in Modal Analysis, Measurement. 44 (2011) 2112–2123. C.X. Qu, T.H. Yi, H.N. Li, B. Chen, Closely spaced modes identification through modified frequency domain decomposition, Measurement. 128 (2018) 388–392. doi:10.1016/j.measurement.2018.07.006. G. Bamnios, A. Trochides, Dynamic behaviour of a cracked cantilever beam, Appl. Acoust. 45 (1995) 97–112. D.P. Patil, S.K. Maiti, Detection of multiple cracks using frequency measurements, Eng. Fract. Mech. 70 (2003) 1553–1572. J. Thalapil, S.K. Maiti, Detection of longitudinal cracks in long and short beams using changes in natural frequencies, Int. J. Mech. Sci. 83 (2014) 38–47. G.-R. Gillich, Z.-I. Praisach, Modal identification and damage detection in beam-like structures using the power spectrum and time–frequency analysis, Signal Processing. 96 (2014) 29–44. N.T. Khiem, L.K. Toan, A novel method for crack detection in beam-like structures by measurements of natural frequencies, J. Sound Vib. 333 (2014) 4084–4103. doi:10.1016/j.jsv.2014.04.031. C.X. Qu, T.H. Yi, Y.Z. Zhou, H.N. Li, Y.F. Zhang, Frequency Identification of Practical Bridges through Higher-Order Spectrum, J. Aerosp. Eng. 31 (2018) 04018018. doi:10.1061/(ASCE)AS.1943-5525.0000840. C.X. Qu, D.P. Mei, T.H. Yi, H.N. Li, Spurious mode distinguish by modal response contribution index in eigensystem realization algorithm, Struct. Des. Tall Spec. Build. 27 (2018) e1491. doi:10.1002/tal.1491. C.X. Qu, T.H. Yi, X.M. Yang, H.N. Li, Spurious mode distinguish by eigensystem realization algorithm with improved stabilization diagram, Struct. Eng. Mech. 63 (2017) 743–750. doi:10.12989/sem.2017.63.6.743. N.O. Nawari, The Role of Data Mining Techniques in the Prediction of Hurricane Damages, Struct. Congr. 2008. (2008). T. Pang-Ning, M. Steinbach, V. Kumar, Introduction to data mining, Pearson Addison-Wesley, Boston, 2006. doi:10.1016/0022-4405(81)90007-8. K. Ghaedi, Z. Ibrahim, Earthquake Prediction, in: T. Zouaghi (Ed.), Earthquakes - Tectonics, Hazard Risk Mitig., InTech, 2017: pp. 205–227. doi:10.5772/65511. V. Vapnik, The nature of statistical learning theory, Springer-Verlag, New York, 1995. L. Chang, W. Chen, Data mining of tree-based models to analyze freeway accident frequency, J. Safety Res. 36 (2005) 365–375. H. Taghavifar, A. Mardani, L. Taghavifar, A hybridized artificial neural network and imperialist competitive algorithm optimization approach for prediction of soil compaction in soil bin facility, Measurement. 46 (2013) 2288–2299. M.A. Ahmadi, M. Ebadi, A. Shokrollahi, S. Mohammad, J. Majidi, Evolving artificial neural network and imperialist competitive algorithm for prediction oil flow rate of the reservoir, Appl. Soft Comput. J. 13 (2013) 1085–1098.

Declaration of interests

36

☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Declarations of interest: none

Declaration of interests

☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Conflict of interest: none

37



The applicability of data mining for development of damage identification is investigated.



The experimental modal analysis of intact and damaged test structure is carried out.



Different damage scenarios are applied to generate modal parameters of the specimen.



First four natural frequencies are used as inputs for data mining process.



A data mining-based damage identification procedure is proposed for SHM.

38

Experimental Modal Analysis

Database

Model

• Artificial Int

ANN

• Machine Lea

SVM • Statistical

CAR

Modal Parameters

Excitation  Response

Checking Patterns

Deployment

Training ANN

Testing

ANN-ICA

39