A clustering approach for assessing external corrosion in a buried pipeline based on hidden Markov random field model

A clustering approach for assessing external corrosion in a buried pipeline based on hidden Markov random field model

Structural Safety 56 (2015) 18–29 Contents lists available at ScienceDirect Structural Safety journal homepage: www.elsevier.com/locate/strusafe A ...

1MB Sizes 1 Downloads 98 Views

Structural Safety 56 (2015) 18–29

Contents lists available at ScienceDirect

Structural Safety journal homepage: www.elsevier.com/locate/strusafe

A clustering approach for assessing external corrosion in a buried pipeline based on hidden Markov random field model Hui Wang a, Ayako Yajima a, Robert Y. Liang a,⇑, Homero Castaneda b,⇑ a b

Department of Civil Engineering, The University of Akron, Akron, OH 44325-3905, USA Department of Material Science and Engineering, National Corrosion Center, Texas A&M University, College Station, TX 77845, USA

a r t i c l e

i n f o

Article history: Received 9 May 2014 Received in revised form 18 May 2015 Accepted 20 May 2015

Keywords: Pipeline External corrosion Finite mixture model Hidden Markov random field Clustering In-line inspection

a b s t r a c t This paper describes the use of a clustering approach based on hidden Markov random field to extract potential homogeneous segments from a large length right-of-way of a pipeline structure with heterogeneous soil properties. This approach extends the conventional finite mixture model so that the spatial correlation of external corrosion sites can be taken into consideration. An algorithm is established for classifying corrosion defects using soil properties from an in-situ survey and location information from in-line inspection reports. The categorized corrosion defects reveal the hidden patterns of corrosion degradation in different segments along a pipeline structure. Stochastic simulation is employed to test this clustering approach. An example involving a 110-km pipeline interval is employed to illustrate the implementation of the clustering approach. The results indicate that the process of external corrosion propagation in a buried pipeline is position-dependent and is highly related to the soil environment. In addition, the results show that this phenomenon can be interpreted by segmentation using the proposed clustering method. A clustering-based inspection strategy is discussed as a way to apply the present approach. Ó 2015 Elsevier Ltd. All rights reserved.

1. Introduction An underground pipeline is a superior choice for the long-distance transportation of oil, gas and other products for downstream operations in the petroleum industry. However, soil corrosivity can decrease the performance of pipeline protection systems (i.e. direct protection such as coatings, as well as protection from an external source such as cathodic protection) to varying degrees and can produce external corrosion defects that may lead to failure conditions [1]. Metal deterioration in corrosive environments has been well studied for specific types of corrosion and materials during the last few decades. Several approaches (e.g., deterministic approach [2,3], probabilistic models [4–7] and data mining approach [8,9]) are applied to illustrate corrosion propagation on metal surfaces. For the progression of pitting corrosion in pipeline structures, a comprehensive review is available in [10]. It is well known that the corrosivity of the surrounding soil as well as the physicochemical characteristics of materials will greatly affect the degradation rate ⇑ Corresponding authors. Tel.: +1 (330) 972 7190; fax: +1 (330) 972 6020 (R.Y. Liang). E-mail addresses: [email protected] (H. Wang), [email protected] (A. Yajima), [email protected] (R.Y. Liang), [email protected] (H. Castaneda). http://dx.doi.org/10.1016/j.strusafe.2015.05.002 0167-4730/Ó 2015 Elsevier Ltd. All rights reserved.

[11,12]. In the context of a buried pipeline structure, because a pipeline is typically very long, the soil properties along the pipeline right-of-way could vary to a significant degree. Hence, the external corrosion propagation is position-dependent. In terms of probability and stochastic process for assessing the corrosion propagation, a notable limitation of some previous models [13–15] is the lack of consideration of the spatial variability of soil properties. For considering the spatial variation of the soil corrosivity along a pipeline infrastructure, a practical approach is to divide the pipeline into equal segments named zones. Each zone has a set of physical and chemical properties of the soil surrounding the pipeline structure. The corrosion depth can be measured and located with an in-line inspection (ILI) device; hence, each sized defect is linked with a zone and its soil properties. All the soil properties form a high-dimensional feature space, and the defects are mapped into the feature space as vectors. Clustering techniques should be employed to discover the intrinsic structure or hidden pattern of data points in the feature space. As the soil properties within each cluster will have high similarity, it is relatively easy to estimate the probability distribution of the corrosion depth within each cluster. Several clustering techniques have been successfully applied to extreme wind speed analysis [16], structural vulnerability analysis [17,18] in civil engineering and external corrosion in pipeline structures without considering spatial correlation [19]. However,

19

H. Wang et al. / Structural Safety 56 (2015) 18–29

it is novel to go one step further and apply a clustering approach that considers spatial correlation in assessing external corrosion in a pipeline structure. Since the corrosion growth of defects that are close to each other may be similar, spatial correlation could apply a certain constraint to the corrosion propagation. The hidden Markov random field (HMRF) is introduced here to handle the spatial constrained problem because it provides a convenient way of formulating an extension of a finite mixture model for spatially dependent data [20]. The merit of the HMRF derives from Markov random field (MRF) theory, in which the spatial information is encoded through the constraints of the neighborhood configuration, as we expect neighboring defects to have the same class label and a similar corrosion degradation process. The configuration of the state sequence (i.e. the sequence of cluster labels assigned to the defects) of HMRF is unobservable, but it is able to be inferred through a random field of features (i.e., the soil properties) [21]. HMRF is a powerful tool for considering spatial correlation in image segmentation [22–25], and in this work, we apply it in the context of assessing the external corrosion of a pipeline. In the present work, hidden Markov random field theory and the finite Gaussian mixture (FGM) model are employed to develop a clustering approach named the HMRF-FGM model, which can be used to identify and extract segments with homogeneous soil properties from a pipeline region with a large length and a heterogeneous soil environment. This approach is compared to the conventional FGM model [20] to illustrate its accuracy and robustness. By applying HMRF-FGM model, defect depth records reported by two consecutive in-line inspections are classified into groups corresponding to the homogeneous segments. The evolution of corrosion propagation within different segments is studied. A cluster-based inspection strategy for improving the existing maintenance policy is discussed as one of the potential applications of the present approach. The present paper is organized into six sections. In Section 2, we describe the framework of the clustering approach. A parameter study and robustness assessments of HMRF-FGM are performed in Section 3. The HMRF-FGM is applied to a real life pipeline structure in Section 4. A clustering-based inspection strategy and some potential issues are discussed in Section 5, and conclusions are presented in Section 6. 2. Framework of the clustering approach In the framework for the clustering approach, we consider L ¼ f1; 2; 3; . . . ; lg as the set of states (i.e. the cluster ID of a defect). Let S ¼ f1; 2; 3; . . . ; sg be a finite index set, and we shall refer to the set S as the set of locations or sites of all the defects from the start point to the end point along the interval of interest in the pipeline. Now let X j ¼ fxj jxj 2 Lg be a state space of a particular site j 2 S. Q Then the product space X ¼ sj¼1 X j ¼ Ls is defined as the configuration space. The state values of all the sites x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ are a configuration in the product space X. The configuration is considered as a random field p(x). Then, let us consider a second random field p(y), the state space of which is also a product space and is Q denoted as Y ¼ sj¼1 Y j ; Y j ¼ fyj jyj 2 Rd g, and y ¼ ðy1 ; y2 y3 ; . . . ; ys Þ is a realization of p(y). In the context of pipeline corrosion, the random variable Y is a random vector of the soil features and the d-dimension space Rd is the feature space formed by all soil properties. Suppose a defect at site j is assigned with a certain label l (i.e. defect j belongs to cluster l). The probability of the observation of local soil properties yj will follow a conditional probability distribution:

pðyj jxj ¼ lÞ ¼ f ðyj jhl Þ; l 2 L

ð1Þ

where hl is the set of distribution parameters. It is worth noting that for all l 2 L, the distribution family f ð; hl Þ has the same known analytical form. In this work, we employ the finite Gaussian mixture (FGM) model as the conditional probability distribution. Before the derivation of HMRF-FGM model, we must first briefly introduce the FGM model. 2.1. Finite Gaussian mixture model We suppose that all the defects can be classified into l clusters. The density f ðyj Þ of Y j can be written in the form

f ðyj jUÞ ¼

l X

pi f i ðyj jhi Þ

ð2Þ

i¼1

where pi are nonnegative quantities that sum to one and are referred to as the mixing proportions or weights. pi are independent of the individual defect j 2 S. The f i ðyj jhi Þ are referred to as the component densities. Here, we use multivariable Gaussian distribution as the component density i.e. hi ¼ ðli ; Ri Þ; i 2 L where l is a vector that contains the mean of every feature and R is the covariance structure of all the features, which is defined in [26]. We take U as the parameter set of the mixture model U ¼ fpi ; hi ji 2 Lg. This is so-called finite Gaussian mixture model. The standard finite Gaussian mixture model presented in Eq. (2) is widely used in unsupervised clustering [20]. However, it is not considered to be a complete model in spatially-constrained problem, as it only describes the data statistically and there is no spatial information involved in this framework. In Eq. (2), we notice that the mixing proportions are spatially independent, which means the probability of a defect’s belonging to a certain cluster is homogenous along the entire pipeline interval. To overcome this drawback of the FGM model, a clustering model that is adaptive to spatial information based on hidden Markov random field model will be presented. 2.2. Hidden Markov random field – Finite Gaussian mixture model A typical HMRF has the following three characteristics [25]: (1) The configuration of all sites x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ; i.e. the clustering result, is unobservable. (2) The observable random field p(y|x) is called the observed or emitted random field given any particular configuration x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ. (3) A typical conditionally independent assumption is adopted that has the expression:

pðyjxÞ ¼

s Y

pðyj jxj Þ

ð3Þ

j¼1

The conditionally independent assumption still guarantees the Markovianity of the random field pðx; yÞ [21]. More complicated assumptions might also be considered [20]. Based on the above, the joint probability of (X, Y), i.e. a clustering configuration and the corresponding observations of soil properties, has the form

Pðx; yÞ ¼ PðxÞPðyjxÞ ¼ PðxÞ

s Y pðyj jxj Þ

ð4Þ

j¼1

According to Gibbs models [22], the probability distribution function of a certain configuration of sites x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ has the form:

20

PðxÞ ¼

H. Wang et al. / Structural Safety 56 (2015) 18–29

1 exp ðUðxÞ=T Þ Z

ð5Þ

where Z is a normalizing constant called a partition function, T stands for temperature (which we will talk about in more detail later), and U is the energy function having the form:

UðxÞ ¼

X V c ðxi ; xj Þi; j 2 c; i–j

ð6Þ

c2C

where c is a clique (i.e. two consecutive sites j and j + 1 in the context of a one-dimensional Markov random field) and C is the set of all cliques within S. V c is referred to as the clique potential, which depends on the local configuration of c and has the following expression:

V c ðxi ; xj Þ ¼ b  dðxi  xj Þ

ð7Þ

where dðÞ stands for the Kronecker delta function

 dðxi  xj Þ ¼

1; if xi ¼ xj 0;

otherwise

ð8Þ

Because of the one-dimensional spatial relationship among the corrosion defects along the longitudinal direction of a pipeline structure, we consider the random field pðxÞ as an ordinary one-dimensional Markov chain relative to a neighborhood system on S. Let @ denote the neighborhood system, i.e. a collection @ ¼ f@ j jj 2 Sg of sets @ j ¼ fsjjs  jj 6 k; s–jg and integer k is the order of a Markov chain. The ‘‘two-sided’’ Markov property is shown in Eq. (9):

    P X k ¼ xk jX j ¼ xj ; j–k ¼ P X k ¼ xk jX j ¼ xj ; j 2 @ k 8k 2 S

ð9Þ

In other words, the sites of corrosion defects in S are related to each other via a neighborhood system in terms of spatial constraint. We are interested in finding the conditional probability of the configuration at a certain site given a neighborhood system Pðxj jx@j Þ. According to the ‘‘two-sided’’ Markov property [22,23],

    P xj ; x@ j P xj ; xSfjg  ¼P   Pðxj jx@j Þ ¼ P xj 2L P xj ; xSfjg xj 2L P xj ; x@ j  P  exp  c@j V c =T  P  ¼P xj 2L exp  c@ j V c =T

then Eq. (12) reduces to an FGM model. From this point of view, FGM is a reduced case of HMRF-FGM. Therefore, the HMRF-FGM model is able to encode both spatial and statistical information. It is more realistic for the external corrosion modeling than FGM, since the local soil environment may cause similar corrosion propagation. The parameters hl ¼ ðll ; Rl Þ can be optimized with the expecta tion–maximization (EM) algorithm [27,28], which is a method for finding the maximum likelihood by optimizing the parameters under hidden conditions [29]. The EM algorithm for the HMRF is considerably more difficult, because it is typically impossible to exactly compute the log-likelihood [20]. One solution is to replace the E-step with a stochastic restoration step [30]. The idea is to update the site configuration x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ by using the maximum a posteriori (MAP) estimation. We will explain this algorithm in detail in Section 2.4, following an illustration of the MAP estimation process. 2.3. Maximum a posteriori (MAP) estimation Given a realization of the random field p(y) (i.e., we have a set of soil properties along a pipeline structure), we aim to group together the corrosion defects that have a similar soil environment. ^,which In other words, we want to get a configuration of all sites x is an estimation of the true classification x (which is unknown). According to Bayes’ theory:

PðxjyÞ / PðxÞPðyjxÞ

ð14Þ

where PðxÞ / expðUðxÞ=TÞ and PðyjxÞ / expðUðyjxÞÞ. UðyjxÞ is the likelihood energy given the configuration x and has the form:

UðyjxÞ ¼

PðyjxÞ ¼ ð10Þ

s     T 1   X 1 yj  lxj R1 yj  lxj þ log jRxj j xj 2 2 j¼1

ð15Þ

  s   T Y 1 1 pffiffiffiffiffiffiffi exp  yj  lxj R1 yj  lxj xj 2 2p j¼1 j¼1    1 þ log jRxj j ð16Þ 2 s Y

Pðyj jxj Þ ¼

Hence pðxjyÞ / expððUðxÞ=T þ UðyjxÞÞÞ. UðxÞ=T þ UðyjxÞ is called the total posterior energy. According to the MAP criteria:

^ ¼ arg maxfPðxÞPðyjxÞg x

ð17Þ

x2X

ð11Þ

In the hidden Markov random field, the configuration of all sites is unobservable in advance. We consider the marginal probability of Yj based on Eq. (11)

  X P yj jx@ j ¼ Pðljx@ j Þf ðyj jhl Þ l2L

ð13Þ

Eq. (15) comes from the joint likelihood function, which is based on the conditionally independent assumption:

where the clique c in Eq. (10) is defined as a pair of sites ðj; sÞ; s 2 @ j to reflect the spatial constraint given the neighborhood system @ j of site j. The joint probability of a pair (Xj, Yj), i.e. the cluster label Xj of a defect at site j and the corresponding soil properties Yj, given the local neighborhood configuration x@ j , has the expression:

      P xj ; yj jx@ j ¼ P xj jx@ j P yj jxj

  P ljx@ j ¼ PðlÞ ¼ pl

ð12Þ

where hl ¼ ðll ; Rl Þ is the parameter set of the Gaussian component distribution function. Eq. (12) is the so-called hidden Markov random field – finite Gaussian mixture (HMRF-FGM) model. Returning to the finite Gaussian mixture model, we can compare Eq. (2) with Eq. (12). It is worth noting that the mixing proportions Pðljx@ j Þ in HMRF-FGM are determined by the local neighborhood configurations, whereas the mixing proportions pi in FGM do not change along the length of a pipeline structure. If we assume the site configurations Xj are independent of each other:

Eq. (17) is equivalent to minimizing the total posterior energy function:

^ ¼ arg minfUðxÞ=T þ UðyjxÞg x

ð18Þ

x2X

It is computationally infeasible to perform trials of all the possible configurations. Therefore, an iterative optimization technique is ^ from an initial estimation of x0 . One pracneeded here to update x tical way is to merely update the state Xj of a site j at one time and to do this from the first site to the last site along a pipeline structure. This procedure defines a single cycle of an iterative algorithm for estimating x , and this is the main idea of the algorithm referred to as iterated conditional modes (ICM) [23]. Within the tth iteration, a ðtÞ

plausible choice of the updated cluster label xj of a site j is just the cluster ID l 2 L which can minimize the local posterior energy function, given the soil properties at site j and the current local configuration:

H. Wang et al. / Structural Safety 56 (2015) 18–29

21

Fig. 1. ICM-EM algorithm for clustering based on HMRF-FGM model where the convergence criteria, defined as the relative difference between two successive values of the indicator variable (i.e. UðxÞ=TðtÞ þ UðyjxÞ or Q-function value), fall below a small threshold.

   1  T 1   1 ðtÞ ðtÞ ðtÞ ðtÞ ðtÞ xj ¼ arg min yj  ll Rl yj  ll þ log jRl j 2 2 l2L X ½b  dðl  ~xi Þ=TðtÞ þ

ð19Þ

i2@ j

where ~xi is the current configuration of the neighbors of site j. For ðtÞ ðt1Þ . i < j, ~xi ¼ xi ; for i > j, ~xi ¼ xi In the framework of ICM, the local conditional probability Pðxj jx@j Þ depends on a global control parameter T (as was defined in Eq. (10)). At low temperatures, the local conditional probability tends to strengthen the spatial constraint, whereas at high temperature the random field is less constrained (i.e., the field tends to be position-independent). A wise strategy is to set a high temperature on the previous cycles so that the label cannot be fixed too early; on the basis of an unreliable initial estimation, then, the temperature T decreases slowly. This process is also called annealing. In the tth iteration, the temperature satisfies the following expression [22]:

TðtÞ P

c logð1 þ tÞ

ð20Þ

where c is a constant independent of loop index t; in the context of our problem, c ¼ b. Though it requires a little more iteration, the result is more robust than when using the scheme that does not consider temperature. For simplicity and without losing rationality, we use the boundary value, which means:

b ¼ logð1 þ tÞ TðtÞ

ð21Þ

For practical purposes, the temperature must not keep decreasing without control. If it does, it will result in a single cluster solution, which is meaningless for our problem. Hence, a maximum number of iterative cycles t  for annealing should be set in advance. Then, Eq. (19) becomes: ðtÞ

   1  T 1   1 ðtÞ ðtÞ ðtÞ ðtÞ yj  ll Rl yj  ll þ log jRl j 2 2 l2L X  dðl  ~xi Þ  logð1 þ ~tÞ þ

xj ¼ arg min

ð22Þ

i2@ j



t; t 6 t  . In general, a relatively weaker p(x) should be t ; t > t used in ICM [23] and in our problem; a value of t ¼ 7 seems to be appropriate because, after seven iterative loops, the decrement of the local posterior energy is no longer significant. After each cycle of iteration, we estimate the updated total posterior energy. This strategy can find the minimum of the total posterior energy very quickly, and it will guarantee convergence [23]. where ~t ¼

2.4. ICM-EM clustering algorithm In the HMRF-FGM model, the class label of each site Xj and the parameters of each cluster hl ¼ ðll ; Rl Þ are interdependent. At the very beginning, both the site configuration and the distribution parameters are unknown. The data set is said to be ‘‘incomplete’’ and the clustering problem is regarded as an ‘‘incomplete-data’’ problem [25].

22

H. Wang et al. / Structural Safety 56 (2015) 18–29 Synthetic configuration

(a)

-150

3 2 1

0

-200 -250 -300

200 300 Defect ID

400

-400 0

500

x 10

3 2 1

-350

100

5

5 4

Resistivity ( Ω .cm)

Half-cell potential (Eh)

4 Cluster ID

(c)

(b) -100

100

200 300 Defect ID

400

500

0 0

100

200 300 Defect ID

400

500

Fig. 2. Simulation result of a grouping configuration. (a) In this example, there are 500 corrosion defects, which are classified into four clusters; for illustration purposes, two realizations from the emission distributions are shown: (b) half-cell potential; (c) soil resistivity.

The ICM-EM algorithm is based on the HMRF-FGM model. The ICM-EM algorithm that is employed to solve the ‘‘incomplete-data’’ problem is a modification of conventional EM

Second, in the M-step, the updated estimates of parameters Hðtþ1Þ are calculated by the maximization of the conditional expectation function which is referred to as Q-function based on the posterior

ðt1Þ ðt1Þ ; x2 ; ðt1Þ ðt1Þ ðtÞ ðtÞ ðtÞ ðtÞ x3 ; . . . ; xs Þ and the current estimation of ¼ ðh1 ; h2 ; h3 ; ðtÞ . . . ; hl Þ, this strategy mainly contains two steps within a loop of ðt1Þ ðtÞ

probability PðtÞ ðljyj Þ and current HðtÞ :

algorithm. Given the previous estimation of xðt1Þ ¼ ðx1

H

iteration: first, in the E-step, x

is updated to x



by employing

ICM and taking HðtÞ and the observed soil properties y as inputs. Having a current configuration of xðtÞ , the posterior probability PðtÞ ðljyj Þ is evaluated for all sites, and it has the following expression:

    ðtÞ PðtÞ ljx@ j f yj jhl     PðtÞ ðljyj Þ ¼ P ðtÞ ðtÞ ljx@ j f yj jhl l2L P

Original Classification

3 Cluster ID

Cluster ID

ð24Þ

λ = 1 Classification

(b)

3 2 1

2 1

0

100

200 300 Defect ID

400

500

λ = 4 Classification

(c)

0

200 300 Defect ID

400

500

400

500

λ = 7 Classification

3 Cluster ID

2

2 1

1

0

100

(d)

3 Cluster ID

   1 Pðljyj Þ  log P ljx@ j  log jRl j 2 j2S l2L      1 1 T logð2 y  l  p Þ  yj  ll R1 j l l 2 2

Actually, we perform this maximization approximately under the conditional independent assumption. Initial estimates are generated by employing a conventional EM algorithm. In the E-step, we ignore the spatial constraint and merely estimate Pðljyj Þ at each site j independently. A detailed description of the EM algorithm is available in the literature [20].

ð23Þ

(a)

X X

100

200 300 Defect ID

400

0

500

100

200 300 Defect ID

(e) 0.08 MCR

0.07 0.06 0.05 0.04 0.03 0

2

4 λ

6

8

Fig. 3. Classification results for three different neighborhood systems using ICM-EM algorithm. (a) Original classification; (b) k ¼ 1; (c) k ¼ 4; (d) k ¼ 7; (e) effect of k on the accuracy of clustering results.

23

H. Wang et al. / Structural Safety 56 (2015) 18–29 Table 1a The original parameters of the emission distributions. C1a

Ehb Resistivity pH c [CO2 3 ] [HCO ] 3 [Cl] 2 [SO4 ] a b c

C2

C3

Mean

Variance

Mean

Variance

Mean

Variance

2.68E+02 8.81E+03 5.21E+00 1.00E+00 2.57E+00 2.72E+00 6.67E02

3.08E+03 2.56E+07 3.90E01 6.61E01 3.06E+00 7.01E01 1.65E03

2.23E+02 1.97E+03 6.11E+00 1.06E+00 3.10E+00 3.75E+00 1.10E01

2.76E+03 1.64E+06 3.46E01 4.60E01 2.25E+00 8.01E01 4.50E03

2.97E+02 5.86E+04 4.30E+00 1.35E01 2.03E+00 2.11E+00 5.25E02

1.88E+03 3.19E+09 6.29E01 8.58E02 1.18E+00 2.04E01 3.44E04

C = cluster. Eh = half-cell potential. [.] = ion concentration.

Table 1b The estimated parameters of the emission distributions. C1

Eh Resistivity pH [CO2 3 ] [HCO 3] [Cl] [SO2 4 ]

C2

C3

Mean

Variance

Mean

Variance

Mean

Variance

2.65E+02 8.20E+03 5.25E+00 1.12E+00 2.74E+00 2.77E+00 6.91E02

3.29E+03 2.67E+07 3.71E01 6.61E01 2.98E+00 6.68E01 1.69E03

2.24E+02 2.08E+03 6.16E+00 1.17E+00 3.03E+00 3.69E+00 1.01E01

2.76E+03 1.72E+06 3.37E01 4.60E01 2.94E+00 7.94E01 3.10E03

2.97E+02 6.17E+04 4.38E+00 9.76E02 1.87E+00 2.09E+00 5.05E02

2.06E+03 3.31E+09 6.45E01 1.14E01 1.12E+00 1.62E01 3.54E04

When the convergence criteria are satisfied, the initial configuration x0 is determined by Eq. (25)

xj ¼ arg maxfPðljyj Þg

ð25Þ

l2L

of observations without considering spatial constraint. We employ a conventional FGM model to find the best number of components, since it avoids performing ICM within each iterative loop and thus is faster to trial many cases with a different number of components. We will provide further illustration in Section 4.

Without loss of generality, the ICM-EM algorithm for estimating the actual configuration x and the parameters of each component distribution is shown in Fig. 1.

3. Numerical experiment

2.5. The number of components

3.1. Simulation of hidden Markov random field

A proper number of components can cause the data points to have high similarities within each cluster and low inter-similarities between clusters within the feature space [20,26]. Several studies have been performed on model selection, including the Akaike information criterion (AIC), the Bayesian information criteria (BIC) [28], the heterogeneity test [16], and a Bayesian approach combined with Monte Carlo integration [31]. The BIC considers the dimension of input space by including the number of features as a penalty [28,29], and thus the BIC is more sensitive to model complexity. We used the BIC in the present study in order to determine the number of components. The BIC has the form:

Numerical simulation is often used to verify stochastic models. In the context of assessing soil corrosivity, we consider there are several different groups of corrosion defects, and the set of all external corrosion defects within a pipeline interval is a Markov random field pðxÞ with the state set L ¼ f1; 2; 3; . . . ; lg. Each defect with its corresponding soil features indicates the local soil corrosivity. Simulation is used to generate a synthetic grouping configuration x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ and we consider this as the original configuration that is used for comparison with the estimated result from the HMRF-FGM model. Here, a Gibbs sampler proposed by Geman and Geman [22] is applied. Based on the generated realization x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ of pðxÞ, at each site, the observation yi (i.e. a set of soil properties) is sampled from the conditional Gaussian emission distribution f ðyj jhl ; xj ¼ lÞ. Seven well-known critical soil properties related to corrosion are considered in this study: half-cell potential, soil resistivity, pH, and the concentrations of   2 four different ions (CO2 3 , HCO3 , Cl , SO4 ). All the distribution parameters (i.e. means and covariance structures) related to the seven properties are estimated from a data set extracted from a real-world survey along a pipeline interval. Fig. 2 shows an example of a synthetic configuration and its emission realization (i.e. the simulation results of soil properties), in which the random field has four different clusters. In a real-world problem, the emission realization y of random field pðyÞ is obtained by soil survey at in-situ sites, and we want to reveal the hidden state sequence x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ .

BIC ¼ 2loglikeðx; hÞ  M logðnÞ

ð26Þ

where loglike(x, h) is the maximized log-likelihood, M is the number of independent parameters to be estimated, and n is the number of data points. Previous studies recommend choosing the model at the maximum BIC [28,29,32,33]; however, the number of components chosen could be too large or too small, and this could lead to overfitting or to misclassification. Fraley and Raftery [28] suggested selecting the number of components where the BICs for all covariance models seem to converge. For the present study, we chose the optimal number of clusters at the point where the increment of the BIC begins to slow down. Because spatial correlation is not reflected in the feature space, we are able to choose a number that can maximize the likelihood

24

H. Wang et al. / Structural Safety 56 (2015) 18–29 Original classification

(e) 0.25

(a)

MCR

Cluster ID

HMRF-FGM FGM

0.2

4 3 2

0.15 0.1 0.05

1

0 0

100

200 300 Zone ID

400

500

0

Classification result by HMRF-FGM Σ =0.5 Σ 0

Classification result by FGM Σ =0.5 Σ 0

(b)

4 Cluster ID

Cluster ID

4 3 2 1

3 2 1

0

100

200 300 Zone ID

400

500

0

100

Classification result by FGM Σ =4 Σ 0

(c)

400

500

0

4

4

3

3

2 1

2 1

0

100

200 300 Zone ID

400

500

0

100

400

500

0

4 Cluster ID

4 3 2

3 2 1

1

0

200 300 Zone ID

Classification result by HMRF-FGM Σ =20 Σ

Classification result by FGM Σ =20 Σ 0

(d)

CLuster ID

200 300 Zone ID

Classification result by HMRF-FGM Σ =4 Σ

Cluster ID

CLuster ID

5 10 15 20 Times of covariance matrix Σ 0

100

200 300 Zone ID

400

500

0

100

200 300 Zone ID

400

500

Fig. 4. Classification results via FGM and HMRF-FGM for three different covariance structures, where R0 is the reference covariance structure: (a) original classification; (b) R ¼ 0:5R0 ; (c) R ¼ 4R0 ; (d) R ¼ 20R0 ; (e) effect of covariance structure on the accuracy of clustering results.

3.2. The effect of the order k on clustering result In implementing the HMRF-FGM model, it is important to choose a proper neighborhood system. In the context of a one-dimensional Markov random field, it is the parameter k that determines the size of neighborhoods. To measure the quality of a classification, we define the misclassification ratio (MCR) in the form:

number of misclassification sites MCR ¼ total number of sites

ð27Þ

Cluster analyses with k ranging from 1 to 7 are performed. Fig. 3 shows some of the classification results and the k  MCR relationship. It worth noting that in Fig. 3, a relatively small k can preserve more ‘‘details’’ of the original classification, which means some special sites that are singular compared with their neighbors can be detected and reflected in the estimated classification. However, as k increases, the spatial constraint is strengthened, which results in

a more consistent classification, hence some singular sites may be misclassified. Based on the result illustrated in Fig. 3(e), a good choice of k would be smaller than 4; in this paper, k ¼ 2 is adopted. The original and estimated parameters of emission distributions when k ¼ 2 are shown in Tables 1a and 1b. The estimated results are acceptable when compared to the original parameters. 3.3. The robustness of ICM-EM algorithm Another issue we confront when we employ a clustering method is that the classification quality may deteriorate as the variance of observations increases and becomes comparable with the difference of means of different clusters. Some clustering approaches may be sensitive to the scatter extent of the data points in feature space under some conditions [25,28]; these approaches include k-means clustering, FGM modeling, and density-based clustering techniques. One possible reason for this sensitivity is that these approaches lack a consideration of spatial constraint.

25

H. Wang et al. / Structural Safety 56 (2015) 18–29

(a) 20 2005 MFL 2010 UT

Density

15

10

5

0 0

0.1

0.2

0.3 0.4 Defect depth (%WT)

0.5

0.6

(b) 0.7 0.6

2005 MFL 2010 UT

This zone is unable to implement ILI inspection

Defect depth (%WT)

0.5 0.4

0.3 0.2

0.1 Only MFL is conducted in this zone 0 0

2

4

6 Mileage (m)

8

10 4

x 10

Fig. 5. Measured defect depth from ILI reports of 2005 and 2010: (a) Density histograms of the defect depth extracted from the ILI reports of two different years; (b) Scatters of the reported metal loss along the pipeline interval.

Because the HMRF-FGM model applies spatial constraint to each site via a neighborhood system, the ICM-EM algorithm is more robust than conventional clustering methods. To demonstrate this advantage of the ICM-EM algorithm, a stochastic simulation was performed using a given synthetic state configuration but different covariance structures of emission functions. In this numerical experiment, the reference covariance structure R was estimated based on the real-world survey data mentioned above. Seven different covariance structures are generated by multiplying a constant to the reference covariance structure. Some classification results via FGM and HMRF-FGM are shown in Fig. 4. FGM model results are easily mixed together when the variances of random variables increase, as can be seen in Fig. 4(a)–(d). However, the HMRF-FGM model is able to hold the spatial configuration well when the statistical separability deteriorates. Fig. 4(e) shows a comparison of MCR between the results via FGM and HMRF-FGM, and it is obvious that HMRF-FGM is relatively more robust than the finite Gaussian mixture model.

4. Example 4.1. General information As an example, we apply the HMRF-FGM model to a real-life underground pipeline system. The pipeline was originally installed in 1969. The total length of the inspected interval is about 110 km. The diameter is 457.2 mm, and the wall thickness is 6.41 mm. Two in-line inspections (ILI) were conducted in 2005 and 2010 via two

different technologies: magnetic flow leakage (MFL) in 2005 and ultrasonic testing (UT) in 2010. For some reason, ILI inspection was not implemented on a sub-interval within the pipeline system in 2005 and 2010. The inspection report includes the location, type and size information for the defects. A total of 1132 external defects were reported in 2005, while 1931 external defects were reported in 2010. The histograms of defect depth and location information are shown in Fig. 5, in which %WT represents the percentage of wall thickness – this ranges from 0 (for a wall that is completely intact) to 1 (for a wall that has failed). It is interesting to note that the histogram of 2005 data seems following the exponential distribution, however, it worth noting that there is no defect depth detected less than 10% of wall thickness. This is because the detectability of the MFL inspection device is limited; when the defect depth is smaller than a threshold level, the probability of detection is so low that a shallow corrosion defect is nearly impossible to detect [34]. Hence, the data from 2005 are truncated, and the exponential assumption may not be true. The data from 2010 are incomplete, as some segments were not inspected in 2010. A soil survey was conducted at every 200 m along the right-of-way of the pipeline in 2012, and we refer to the survey points as stations. Seven soil properties (i.e. half-cell potential,  1 resistivity, pH, and the concentrations of CO2 3 , HCO3 , Cl , and SO2 ) and some other soil properties were either measured or esti4 mated. After performing correlation analysis, some properties that were highly dependent on other properties were excluded. Only the seven soil properties listed above are taken into consideration in the analysis. Three critical soil properties (i.e. resistivity, pH and

26

H. Wang et al. / Structural Safety 56 (2015) 18–29 5

Resistivity ( Ω .cm)

3.5

x 10

3 2.5 2 1.5 1 0.5 0 8 7

pH

6 5 4 3 2

[Cl-] ( mmol / L)

10 8 6 4 2 0

0

2

4

6

8

Mileage (m)

10 4

x 10

Fig. 6. Three critical soil properties along the right-of-way of the pipeline structure.

1.85

x 10

4

Test Test Test Test Test

1.8 1.75

BIC

1.7

set set set set set

1 2 3 4 5

1.65 1.6 1.55 1.5 1.45 0

5

10 15 Number of components

20

Fig. 7. The number of components vs. BIC for the covariance structure of diagonal type with varying volume and shape for five test sets.

concentration of Cl) are shown in Fig. 6 for the purpose of illustrating the heterogeneity of the soil environment along the pipeline structure. 4.2. Clustering procedure In the clustering procedure, only soil properties and location information are used. Each data point represents a corrosion defect associated with the seven soil properties, which were estimated by

taking the average of the survey data from the two nearest stations. When using the FGM model to determine the number of components, the data are assumed to be generated from k mixed multivariate Gaussian distributions with different parameters. We begin our trials from k = 2 to k = 20 using five different test sets. Most of the test sets show a clear concave shape. The minimum BIC of different test sets are shown in Fig. 7. Test sets 1, 2 and 5 achieved their minimum BIC at k = 5 for the diagonal covariance model having varying volumes and shapes. This implies that five clusters could be most preferable for our data. Therefore, we choose k = 5 as the number of components. Both FGM and HMRF-FGM models were applied to our data set. The classification results for 2005 and 2010 are shown in Fig. 8. As we expect, the external corrosion propagation is heterogeneous along the pipeline and can be clustered into five groups. The FGM model is not able to reflect the spatial constraint, so the classification is mixed together in physical space (i.e. the one-dimensional space along the pipeline structure). However, the segments become very clear once the HMRF-FGM model is applied. We prefer to use the HMRF-FGM classification results for the soil corrosivity assessment because the soil properties usually cannot fluctuate rapidly within a small area, and hence the defects are believed to be spatially correlated. The centers (i.e. the means) of soil properties in feature space corresponding to different clusters are shown in Table 2.

27

H. Wang et al. / Structural Safety 56 (2015) 18–29

(a)

FGM 2005

HMRF-FGM 2005

5

5

4

4

Cluster ID

Cluster ID

6

3 2

200

400 600 Defect ID

800

0

1000

200

FGM 2010

(b) 5

5

4

4

3 2

800

1000

3 2 1

1 0

400 600 Defect ID

HMRF-FGM 2010

Cluster ID

Cluster ID

2 1

1 0 0

3

500

1000 Defect ID

1500

0

500

1000 Defect ID

1500

Fig. 8. Comparison of clustering results via FGM and HMRF-FGM: (a) 2005 data; (b) 2010 data. Defects are numbered consecutively from the start station to the end station along the inspected pipeline interval.

4.3. Corrosivity assessment The measured corrosion depth by ILI is considered as the indicator of soil corrosivity. Within each cluster, the soil properties are assumed to be homogeneous. The histograms of metal loss for different clusters are depicted in Fig. 9. It can be seen from this figure that the distribution of metal loss evolves with time. Since 2005 data are truncated and we have no information for wall thicknesses that are less than 0.1 (%WT), the evolution of the corrosion propagation in each cluster may not be illustrated well. However, the variance of the defect depth can be related to the corrosion rate. This is because, within a certain cluster, different defects will have a different starting time and hence will have different histories of corrosion propagation. A faster corrosion rate may lead to a deeper depth for older defects, while new defects keep being generated with shallow pits. Cluster 2 in Fig. 9 implies this phenomenon very well. In a moderate soil environment, a segment will show less difference in corrosion propagation between old defects and young ones, as can be noticed in Cluster 3 and Cluster 4 in Fig. 9. For comparison of the soil corrosivity between any two clusters, we adopt the median as the typical number because it is more robust in the current context. Fig. 10 illustrates the estimations of medians for five clusters. The 95% confidence intervals are represented by error bars, which are obtained by employing a bootstrap methodology. Comparison of the width of the confidence intervals implies that the uncertainty of the median will vary along the pipeline. This is partly because the size of different clusters shows a large amount of variation. A smaller cluster size usually leads to a higher uncertainty of estimators (e.g. Cluster 4 from

2005 data). Fig. 10 indicates that Cluster 2 has the highest corrosion rate; Clusters 3 and 4 may have equivalent corrosion rates. Similarly, Clusters 1 and 5 may have equivalent corrosion rates, and these are higher than the rates in Clusters 3 and 4. It is worth noting that although the soil properties are different between clusters, it is possible that the corrosion propagations are similar because of the complexity of the corrosion process under in-situ conditions.

5. Discussions 5.1. Clustering-based inspection strategy Based on the clustering result and the corrosivity assessment, different inspection strategies can be applied to different segments of a pipeline structure. The concept is that for the segments with high corrosion rate, routine (or more frequent) inspections should be conducted to obtain more information about the evolution of corrosion depth as well as to locate and replace the failed segments. For the segments with moderate corrosion propagation, a relatively longer interval between inspections could be adopted. In practice, the rational optimal inspection scheme should be based on a stochastic model of corrosion growth [15,35] and system reliability analysis [36], which is outside the scope of this study. The clustering approach can provide categorized data that can be further processed with model calibration and system reliability analysis, so that the estimated reliability for current and future conditions is time- and position-dependent. An optimized inspection scheme can then be formed according to the tolerance of risk.

Table 2 The center of soil properties for the five clusters.

Eh Resistivity pH [CO2 3 ] [HCO 3]  [Cl ] 2 [SO4 ]

C1

C2

C3

C4

C5

1.61E+02 3.96E+03 6.24E+00 7.52E01 3.48E+00 4.78E+00 1.78E01

2.38E+02 6.18E+03 5.89E+00 1.80E+00 2.76E+00 3.85E+00 1.33E01

3.17E+02 6.66E+03 4.97E+00 6.15E01 1.64E+00 2.37E+00 6.32E02

3.38E+02 1.74E+04 5.17E+00 1.18E+00 2.57E+00 2.55E+00 6.24E02

2.85E+02 5.12E+04 4.59E+00 5.75E01 2.70E+00 2.36E+00 6.40E02

28

H. Wang et al. / Structural Safety 56 (2015) 18–29

(a)

(b) C1-2005 MFL C1-2010 UT

15 10

0 0

0 0

0.6

15 10 5

0.2 0.4 Defect depth (%WT)

0 0

0.6

0.2 0.4 Defect depth (%WT)

0.6

(e)

(d)

20 Density

20

C5-2005 MFL C5-2010 UT

25

C4-2005 MFL C4-2010 UT

25 Density

20

10 5

C3-2005 MFL C3-2010 UT

25

15

5 0.2 0.4 Defect depth (%WT)

(c)

20 Density

Density

20

C2-2005 MFL C2-2010 UT

25

Density

25

15 10

15 10 5

5 0 0

0.2 0.4 Defect depth (%WT)

0.6

0 0

0.2 0.4 Defect depth (%WT)

0.6

(f) 0.7 This zone is unable to implement ILI inspection

C1-2005 MFL C2-2005 MFL C3-2005 MFL C4-2005 MFL C5-2005 MFL C1-2010 UT C2-2010 UT C3-2010 UT C4-2010 UT C5-2010 UT

0.6

Defect depth (%WT)

0.5

0.4

0.3

0.2

0.1 Only MFL is conducted in this zone 0 0

2

4

6 Mileage (m)

8

10 x 10

4

Fig. 9. Clustering result of HMRF-FGM: Density histograms of metal loss for (a) Cluster 1; (b) Cluster 2; (c) Cluster 3; (d) Cluster 4; (e) Cluster 5; and scatter plots of the reported metal loss along the pipeline interval. Note: ‘‘C1-2005 MFL’’ means Cluster 1 of 2005 data using MFL technology.

0.26

2005 data 2010 data

Defect depth (%WT)

0.24 0.22 0.2 0.18 0.16 0.14 0.12 0.1

C1

C2

C3 Cluster ID

C4

C5

Fig. 10. Estimation results of 95% confidence intervals of medians for the five clusters.

5.2. The detectability and error of ILI device The MFL data are truncated at metal loss = 0.1 due to the detection limit of the MFL device, and this results in missing data for the inspection report. Although very shallow defects are trivial for

integrity assessment, information about shallow defects is essential for analyzing the evolution of defect growth. Moreover, the truncated data may lead us to obtain a biased estimation of the probability density function of the corrosion depth. Similarly, UT equipment also suffers from intrinsic measurement errors [37]. While UT is more sensitive to shallow defects due to the physical principle applied to it, UT equipment cannot measure deep defects accurately since the second echo is difficult to detect for a very thin remaining wall [38]. In addition, it is known that ILI data are affected by systematic and random errors [39]. Therefore, a more realistic assessment of external corrosion should be based on corrected inspection data. This aspect is beyond the scope of the present paper, but it cannot be ignored. 5.3. The density of defects While Cluster 2 has the highest corrosion rate, a higher number of corrosion defects are detected in Cluster 3 and Cluster 4. The topography for the location of the studied pipeline interval – which ranges in elevation from 48 m to 2498 m – seems to be responsible for the frequency of the defects instead of the level of soil corrosivity because of (1) initial damage during construction and (2) the structure-soil interaction. During the installation of the pipe and

H. Wang et al. / Structural Safety 56 (2015) 18–29

during the service period, the coating may easily become damaged by the rock and gravel under the pipe in mountainous terrain (from the 58 km point along the pipeline to the 110 km point). About 65% of the defects at pipeline stations between 58 km and 110 km are located at the bottom surface of the pipeline, but only about 30% of defects occur at the bottom for the remaining stations. This makes the geotechnical factors plausible as the reason for the high frequency of detected defects. Therefore, the density of defects at a certain location does not directly reflect the level of corrosivity of the soil. 6. Conclusion In this study, a clustering approach based on a hidden Markov random field is established for assessing external corrosion in a buried pipeline. The combination of hidden Markov random field theory and a finite mixture model can apply spatial constraint to the soil survey data, resulting in a classification that is more realistic and easier to interpret in terms of spatial correlation. The clustering analysis can aid in revealing the underlying pattern of the soil properties and categorizing the corrosion defects by searching homogeneous segments from a heterogeneous random field of soil properties. The results from the numerical simulation addressed the observation that different neighborhood systems can imply different spatial constraint to each site and, usually, a relatively weaker field (i.e. pðxÞ with small k) is preferred since it is more flexible than a stronger field. In addition, the HMRF-FGM model is found to be more robust than the conventional FGM model; hence the estimation result for the HMRF-FGM model should be more reliable. To illustrate the application of the approach, it was applied to a real-life pipeline structure. The variation of corrosion propagation along the pipeline interval is related to the heterogeneous soil environment along the right-of-way of the pipeline. The categorized data reveals that different soil environments may lead to similar corrosion propagation, which implies the difficulty of assessing soil corrosivity levels barely using the in-situ soil survey information. The integration of soil survey data and in-line inspection data is proved to be a better scheme, and this scheme can be integrated into a clustering-based inspection strategy that can improve and guide the maintenance practice. Acknowledgements The authors would like to acknowledge the Secretaría de Energía (SENER) and the Consejo Nacional de Ciencia y Tecnología (CONACYT) in Mexico for the financial support to perform the work in this project (under SENER-CONACYT contract number 159913). References [1] Abes JA, Salinas JJ, Rogers J. Risk assessment methodology for pipeline systems. Struct Saf 1985;2:225–37. [2] Engelhardt GR, Macdonald DD. Unification of the deterministic and statistical approaches for predicting localized corrosion damage. I. Theoretical foundation. Corros Sci 2004;46:2755–80. [3] Romanoff M. Underground corrosion. US Government Printing Office; 1957. [4] Shibata T. 1996 WR Whitney Award Lecture: statistical and stochastic approaches to localized corrosion. Corrosion 1996;52:813–30. [5] Velazquez JC, Caleyo F, Valor A, Hallen JM. Predictive model for pitting corrosion in buried oil and gas pipelines. Corrosion 2009;65:332–42. [6] Provan J, Rodriguez III E. Part I: Development of a Markov description of pitting corrosion. Corrosion 1989;45:178–92.

29

[7] Melchers RE. Representation of uncertainty in maximum depth of marine corrosion pits. Struct Saf 2005;27:322–34. [8] Kamrunnahar M, Urquidi-Macdonald M. Data mining of experimental corrosion data using Neural Network. ECS Trans 2006;1:71–9. [9] Sinha SK, Pandey MD. Probabilistic neural network for reliability assessment of oil and gas pipelines. Comput Aided Civil Infrastruct Eng 2002;17:320–9. [10] Melchers RE. Progression of pitting corrosion and structural reliability of welded steel pipelines. Oil and Gas Pipelines. Integrity Saf Handb 2015:327–42. [11] Frankel G. Pitting corrosion of metals: a review of the critical factors. J Electrochem Soc 1998;145:2186–98. [12] Ferreira CAM, Ponciano JA, Vaitsman DS, Pérez DV. Evaluation of the corrosivity of the soil through its chemical composition. Sci Total Environ 2007;388:250–5. [13] Hong H. Inspection and maintenance planning of pipeline under external corrosion considering generation of new defects. Struct Saf 1999;21:203–22. [14] Alamilla JL, Sosa E. Stochastic modelling of corrosion damage propagation in active sites from field inspection data. Corros Sci 2008;50:1811–9. [15] Zhang S, Zhou W, Qin H. Inverse Gaussian process-based corrosion growth model for energy pipelines considering the sizing error in inspection data. Corros Sci 2013;73:309–20. [16] Hong H, Ye W. Estimating extreme wind speed based on regional frequency analysis. Struct Saf 2014;47:67–77. [17] Pinto J, Blockley D, Woodman N. The risk of vulnerable failure. Struct Saf 2002;24:107–22. [18] Agarwal J, Blockley D, Woodman N. Vulnerability of structural systems. Struct Saf 2003;25:263–86. [19] Wang H, Yajima A, Liang R, Castaneda H. Bayesian modeling of external corrosion in underground pipelines based on the integration of Markov chain Monte Carlo techniques and clustered inspection data. Comput Aided Civil Infrastruct Eng 2014;30:300–16. [20] McLachlan G, Peel D. Finite mixture models. John Wiley & Sons; 2004. [21] Chatzis SP, Varvarigou TA. A fuzzy clustering approach toward hidden Markov random field models for enhanced spatially constrained image segmentation. IEEE Trans Fuzzy Syst 2008;16:1351–61. [22] Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 1984;PAMI-6:721–41. [23] Besag J. On the statistical analysis of dirty pictures. J Roy Stat Soc 1986;48:259–302. [24] Li SZ. Markov random field modeling in computer vision. New York: SpringerVerlag, Inc.; 1995. [25] Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imaging 2001;20:45–57. [26] Celeux G, Govaert G. Gaussian parsimonious clustering models. Pattern Recogn 1995;28:781–93. [27] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 1977;39:1–38. [28] Fraley C, Raftery AE. How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 1998;41:578–88. [29] Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 2002;97:611–31. [30] Qian W, Titterington D. Multidimensional Markov chain models for image textures. J Roy Stat Soc: Ser B (Methodol) 1991:661–74. [31] Yuen KV, Mu HQ. Peak ground acceleration estimation by linear and nonlinear models with reduced order Monte Carlo simulation. Comput Aided Civil Infrastruct Eng 2011;26:30–47. [32] McLachlan GJ, Basford KE. Mixture models. Inference and applications to clustering. Statistics: Textbooks and Monographs. New York: Dekker; 1988. 1988; 1. [33] McLachlan GJ, Krishnan T. The EM algorithm and extensions. WileyInterscience; 2007. [34] Rodriguez III E, Provan J. Part II: development of a general failure control system for estimating the reliability of deteriorating structures. Corrosion 1989;45:193–206. [35] Gomes WJ, Beck AT. Optimal inspection and design of onshore pipelines under external corrosion process. Struct Saf 2014;47:48–58. [36] Zhou W. System reliability of corroding pipelines. Int J Press Vessels Pip 2010;87:587–95. [37] Wang H, Yajima A, Liang RY, Castaneda H. A Bayesian model framework for calibrating ultrasonic in-line inspection data and estimating actual external corrosion depth in buried pipeline utilizing a clustering technique. Struct Saf 2015;54:19–31. [38] Goedecke H. Ultrasonic or MFL inspection: which technology is better for you. Pipeline Gas J 2003;230:34–41. [39] Caleyo F, Alfonso L, Espina-Hernández J, Hallen J. Criteria for performance assessment and calibration of in-line inspections of oil and gas pipelines. Meas Sci Technol 2007;18:1787–99.