A clustering approach for assessing external corrosion in a buried pipeline based on hidden Markov random field model

Structural Safety 56 (2015) 18–29 Contents lists available at ScienceDirect Structural Safety journal homepage: www.elsevier.com/locate/strusafe A ...

Download PDF

1MB Sizes 1 Downloads 98 Views

Report

PDF Reader
Full Text

Structural Safety 56 (2015) 18–29

Contents lists available at ScienceDirect

Structural Safety journal homepage: www.elsevier.com/locate/strusafe

A clustering approach for assessing external corrosion in a buried pipeline based on hidden Markov random ﬁeld model Hui Wang a, Ayako Yajima a, Robert Y. Liang a,⇑, Homero Castaneda b,⇑ a b

Department of Civil Engineering, The University of Akron, Akron, OH 44325-3905, USA Department of Material Science and Engineering, National Corrosion Center, Texas A&M University, College Station, TX 77845, USA

a r t i c l e

i n f o

Article history: Received 9 May 2014 Received in revised form 18 May 2015 Accepted 20 May 2015

Keywords: Pipeline External corrosion Finite mixture model Hidden Markov random ﬁeld Clustering In-line inspection

a b s t r a c t This paper describes the use of a clustering approach based on hidden Markov random ﬁeld to extract potential homogeneous segments from a large length right-of-way of a pipeline structure with heterogeneous soil properties. This approach extends the conventional ﬁnite mixture model so that the spatial correlation of external corrosion sites can be taken into consideration. An algorithm is established for classifying corrosion defects using soil properties from an in-situ survey and location information from in-line inspection reports. The categorized corrosion defects reveal the hidden patterns of corrosion degradation in different segments along a pipeline structure. Stochastic simulation is employed to test this clustering approach. An example involving a 110-km pipeline interval is employed to illustrate the implementation of the clustering approach. The results indicate that the process of external corrosion propagation in a buried pipeline is position-dependent and is highly related to the soil environment. In addition, the results show that this phenomenon can be interpreted by segmentation using the proposed clustering method. A clustering-based inspection strategy is discussed as a way to apply the present approach. Ó 2015 Elsevier Ltd. All rights reserved.

1. Introduction An underground pipeline is a superior choice for the long-distance transportation of oil, gas and other products for downstream operations in the petroleum industry. However, soil corrosivity can decrease the performance of pipeline protection systems (i.e. direct protection such as coatings, as well as protection from an external source such as cathodic protection) to varying degrees and can produce external corrosion defects that may lead to failure conditions [1]. Metal deterioration in corrosive environments has been well studied for speciﬁc types of corrosion and materials during the last few decades. Several approaches (e.g., deterministic approach [2,3], probabilistic models [4–7] and data mining approach [8,9]) are applied to illustrate corrosion propagation on metal surfaces. For the progression of pitting corrosion in pipeline structures, a comprehensive review is available in [10]. It is well known that the corrosivity of the surrounding soil as well as the physicochemical characteristics of materials will greatly affect the degradation rate ⇑ Corresponding authors. Tel.: +1 (330) 972 7190; fax: +1 (330) 972 6020 (R.Y. Liang). E-mail addresses: [email protected] (H. Wang), [email protected] (A. Yajima), [email protected] (R.Y. Liang), [email protected] (H. Castaneda). http://dx.doi.org/10.1016/j.strusafe.2015.05.002 0167-4730/Ó 2015 Elsevier Ltd. All rights reserved.

[11,12]. In the context of a buried pipeline structure, because a pipeline is typically very long, the soil properties along the pipeline right-of-way could vary to a signiﬁcant degree. Hence, the external corrosion propagation is position-dependent. In terms of probability and stochastic process for assessing the corrosion propagation, a notable limitation of some previous models [13–15] is the lack of consideration of the spatial variability of soil properties. For considering the spatial variation of the soil corrosivity along a pipeline infrastructure, a practical approach is to divide the pipeline into equal segments named zones. Each zone has a set of physical and chemical properties of the soil surrounding the pipeline structure. The corrosion depth can be measured and located with an in-line inspection (ILI) device; hence, each sized defect is linked with a zone and its soil properties. All the soil properties form a high-dimensional feature space, and the defects are mapped into the feature space as vectors. Clustering techniques should be employed to discover the intrinsic structure or hidden pattern of data points in the feature space. As the soil properties within each cluster will have high similarity, it is relatively easy to estimate the probability distribution of the corrosion depth within each cluster. Several clustering techniques have been successfully applied to extreme wind speed analysis [16], structural vulnerability analysis [17,18] in civil engineering and external corrosion in pipeline structures without considering spatial correlation [19]. However,

19

H. Wang et al. / Structural Safety 56 (2015) 18–29

it is novel to go one step further and apply a clustering approach that considers spatial correlation in assessing external corrosion in a pipeline structure. Since the corrosion growth of defects that are close to each other may be similar, spatial correlation could apply a certain constraint to the corrosion propagation. The hidden Markov random ﬁeld (HMRF) is introduced here to handle the spatial constrained problem because it provides a convenient way of formulating an extension of a ﬁnite mixture model for spatially dependent data [20]. The merit of the HMRF derives from Markov random ﬁeld (MRF) theory, in which the spatial information is encoded through the constraints of the neighborhood conﬁguration, as we expect neighboring defects to have the same class label and a similar corrosion degradation process. The conﬁguration of the state sequence (i.e. the sequence of cluster labels assigned to the defects) of HMRF is unobservable, but it is able to be inferred through a random ﬁeld of features (i.e., the soil properties) [21]. HMRF is a powerful tool for considering spatial correlation in image segmentation [22–25], and in this work, we apply it in the context of assessing the external corrosion of a pipeline. In the present work, hidden Markov random ﬁeld theory and the ﬁnite Gaussian mixture (FGM) model are employed to develop a clustering approach named the HMRF-FGM model, which can be used to identify and extract segments with homogeneous soil properties from a pipeline region with a large length and a heterogeneous soil environment. This approach is compared to the conventional FGM model [20] to illustrate its accuracy and robustness. By applying HMRF-FGM model, defect depth records reported by two consecutive in-line inspections are classiﬁed into groups corresponding to the homogeneous segments. The evolution of corrosion propagation within different segments is studied. A cluster-based inspection strategy for improving the existing maintenance policy is discussed as one of the potential applications of the present approach. The present paper is organized into six sections. In Section 2, we describe the framework of the clustering approach. A parameter study and robustness assessments of HMRF-FGM are performed in Section 3. The HMRF-FGM is applied to a real life pipeline structure in Section 4. A clustering-based inspection strategy and some potential issues are discussed in Section 5, and conclusions are presented in Section 6. 2. Framework of the clustering approach In the framework for the clustering approach, we consider L ¼ f1; 2; 3; . . . ; lg as the set of states (i.e. the cluster ID of a defect). Let S ¼ f1; 2; 3; . . . ; sg be a ﬁnite index set, and we shall refer to the set S as the set of locations or sites of all the defects from the start point to the end point along the interval of interest in the pipeline. Now let X j ¼ fxj jxj 2 Lg be a state space of a particular site j 2 S. Q Then the product space X ¼ sj¼1 X j ¼ Ls is deﬁned as the conﬁguration space. The state values of all the sites x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ are a conﬁguration in the product space X. The conﬁguration is considered as a random ﬁeld p(x). Then, let us consider a second random ﬁeld p(y), the state space of which is also a product space and is Q denoted as Y ¼ sj¼1 Y j ; Y j ¼ fyj jyj 2 Rd g, and y ¼ ðy1 ; y2 y3 ; . . . ; ys Þ is a realization of p(y). In the context of pipeline corrosion, the random variable Y is a random vector of the soil features and the d-dimension space Rd is the feature space formed by all soil properties. Suppose a defect at site j is assigned with a certain label l (i.e. defect j belongs to cluster l). The probability of the observation of local soil properties yj will follow a conditional probability distribution:

pðyj jxj ¼ lÞ ¼ f ðyj jhl Þ; l 2 L

ð1Þ

where hl is the set of distribution parameters. It is worth noting that for all l 2 L, the distribution family f ð; hl Þ has the same known analytical form. In this work, we employ the ﬁnite Gaussian mixture (FGM) model as the conditional probability distribution. Before the derivation of HMRF-FGM model, we must ﬁrst brieﬂy introduce the FGM model. 2.1. Finite Gaussian mixture model We suppose that all the defects can be classiﬁed into l clusters. The density f ðyj Þ of Y j can be written in the form

f ðyj jUÞ ¼

l X

pi f i ðyj jhi Þ

ð2Þ

i¼1

where pi are nonnegative quantities that sum to one and are referred to as the mixing proportions or weights. pi are independent of the individual defect j 2 S. The f i ðyj jhi Þ are referred to as the component densities. Here, we use multivariable Gaussian distribution as the component density i.e. hi ¼ ðli ; Ri Þ; i 2 L where l is a vector that contains the mean of every feature and R is the covariance structure of all the features, which is deﬁned in [26]. We take U as the parameter set of the mixture model U ¼ fpi ; hi ji 2 Lg. This is so-called ﬁnite Gaussian mixture model. The standard ﬁnite Gaussian mixture model presented in Eq. (2) is widely used in unsupervised clustering [20]. However, it is not considered to be a complete model in spatially-constrained problem, as it only describes the data statistically and there is no spatial information involved in this framework. In Eq. (2), we notice that the mixing proportions are spatially independent, which means the probability of a defect’s belonging to a certain cluster is homogenous along the entire pipeline interval. To overcome this drawback of the FGM model, a clustering model that is adaptive to spatial information based on hidden Markov random ﬁeld model will be presented. 2.2. Hidden Markov random ﬁeld – Finite Gaussian mixture model A typical HMRF has the following three characteristics [25]: (1) The conﬁguration of all sites x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ; i.e. the clustering result, is unobservable. (2) The observable random ﬁeld p(y|x) is called the observed or emitted random ﬁeld given any particular conﬁguration x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ. (3) A typical conditionally independent assumption is adopted that has the expression:

pðyjxÞ ¼

s Y

pðyj jxj Þ

ð3Þ

j¼1

The conditionally independent assumption still guarantees the Markovianity of the random ﬁeld pðx; yÞ [21]. More complicated assumptions might also be considered [20]. Based on the above, the joint probability of (X, Y), i.e. a clustering conﬁguration and the corresponding observations of soil properties, has the form

Pðx; yÞ ¼ PðxÞPðyjxÞ ¼ PðxÞ

s Y pðyj jxj Þ

ð4Þ

j¼1

According to Gibbs models [22], the probability distribution function of a certain conﬁguration of sites x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ has the form:

20

PðxÞ ¼

H. Wang et al. / Structural Safety 56 (2015) 18–29

1 exp ðUðxÞ=T Þ Z

ð5Þ

where Z is a normalizing constant called a partition function, T stands for temperature (which we will talk about in more detail later), and U is the energy function having the form:

UðxÞ ¼

X V c ðxi ; xj Þi; j 2 c; i–j

ð6Þ

c2C

where c is a clique (i.e. two consecutive sites j and j + 1 in the context of a one-dimensional Markov random ﬁeld) and C is the set of all cliques within S. V c is referred to as the clique potential, which depends on the local conﬁguration of c and has the following expression:

V c ðxi ; xj Þ ¼ b dðxi xj Þ

ð7Þ

where dðÞ stands for the Kronecker delta function

dðxi xj Þ ¼

1; if xi ¼ xj 0;

otherwise

ð8Þ

Because of the one-dimensional spatial relationship among the corrosion defects along the longitudinal direction of a pipeline structure, we consider the random ﬁeld pðxÞ as an ordinary one-dimensional Markov chain relative to a neighborhood system on S. Let @ denote the neighborhood system, i.e. a collection @ ¼ f@ j jj 2 Sg of sets @ j ¼ fsjjs jj 6 k; s–jg and integer k is the order of a Markov chain. The ‘‘two-sided’’ Markov property is shown in Eq. (9):

P X k ¼ xk jX j ¼ xj ; j–k ¼ P X k ¼ xk jX j ¼ xj ; j 2 @ k 8k 2 S

ð9Þ

In other words, the sites of corrosion defects in S are related to each other via a neighborhood system in terms of spatial constraint. We are interested in ﬁnding the conditional probability of the conﬁguration at a certain site given a neighborhood system Pðxj jx@j Þ. According to the ‘‘two-sided’’ Markov property [22,23],

P xj ; x@ j P xj ; xSfjg ¼P Pðxj jx@j Þ ¼ P xj 2L P xj ; xSfjg xj 2L P xj ; x@ j P exp c@j V c =T P ¼P xj 2L exp c@ j V c =T

then Eq. (12) reduces to an FGM model. From this point of view, FGM is a reduced case of HMRF-FGM. Therefore, the HMRF-FGM model is able to encode both spatial and statistical information. It is more realistic for the external corrosion modeling than FGM, since the local soil environment may cause similar corrosion propagation. The parameters hl ¼ ðll ; Rl Þ can be optimized with the expecta tion–maximization (EM) algorithm [27,28], which is a method for ﬁnding the maximum likelihood by optimizing the parameters under hidden conditions [29]. The EM algorithm for the HMRF is considerably more difﬁcult, because it is typically impossible to exactly compute the log-likelihood [20]. One solution is to replace the E-step with a stochastic restoration step [30]. The idea is to update the site conﬁguration x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ by using the maximum a posteriori (MAP) estimation. We will explain this algorithm in detail in Section 2.4, following an illustration of the MAP estimation process. 2.3. Maximum a posteriori (MAP) estimation Given a realization of the random ﬁeld p(y) (i.e., we have a set of soil properties along a pipeline structure), we aim to group together the corrosion defects that have a similar soil environment. ^,which In other words, we want to get a conﬁguration of all sites x is an estimation of the true classiﬁcation x (which is unknown). According to Bayes’ theory:

PðxjyÞ / PðxÞPðyjxÞ

ð14Þ

where PðxÞ / expðUðxÞ=TÞ and PðyjxÞ / expðUðyjxÞÞ. UðyjxÞ is the likelihood energy given the conﬁguration x and has the form:

UðyjxÞ ¼

PðyjxÞ ¼ ð10Þ

s T 1 X 1 yj lxj R1 yj lxj þ log jRxj j xj 2 2 j¼1

ð15Þ

s T Y 1 1 pﬃﬃﬃﬃﬃﬃﬃ exp yj lxj R1 yj lxj xj 2 2p j¼1 j¼1 1 þ log jRxj j ð16Þ 2 s Y

Pðyj jxj Þ ¼

Hence pðxjyÞ / expððUðxÞ=T þ UðyjxÞÞÞ. UðxÞ=T þ UðyjxÞ is called the total posterior energy. According to the MAP criteria:

^ ¼ arg maxfPðxÞPðyjxÞg x

ð17Þ

x2X

ð11Þ

In the hidden Markov random ﬁeld, the conﬁguration of all sites is unobservable in advance. We consider the marginal probability of Yj based on Eq. (11)

X P yj jx@ j ¼ Pðljx@ j Þf ðyj jhl Þ l2L

ð13Þ

Eq. (15) comes from the joint likelihood function, which is based on the conditionally independent assumption:

where the clique c in Eq. (10) is deﬁned as a pair of sites ðj; sÞ; s 2 @ j to reﬂect the spatial constraint given the neighborhood system @ j of site j. The joint probability of a pair (Xj, Yj), i.e. the cluster label Xj of a defect at site j and the corresponding soil properties Yj, given the local neighborhood conﬁguration x@ j , has the expression:

P xj ; yj jx@ j ¼ P xj jx@ j P yj jxj

P ljx@ j ¼ PðlÞ ¼ pl

ð12Þ

where hl ¼ ðll ; Rl Þ is the parameter set of the Gaussian component distribution function. Eq. (12) is the so-called hidden Markov random ﬁeld – ﬁnite Gaussian mixture (HMRF-FGM) model. Returning to the ﬁnite Gaussian mixture model, we can compare Eq. (2) with Eq. (12). It is worth noting that the mixing proportions Pðljx@ j Þ in HMRF-FGM are determined by the local neighborhood conﬁgurations, whereas the mixing proportions pi in FGM do not change along the length of a pipeline structure. If we assume the site conﬁgurations Xj are independent of each other:

Eq. (17) is equivalent to minimizing the total posterior energy function:

^ ¼ arg minfUðxÞ=T þ UðyjxÞg x

ð18Þ

x2X

It is computationally infeasible to perform trials of all the possible conﬁgurations. Therefore, an iterative optimization technique is ^ from an initial estimation of x0 . One pracneeded here to update x tical way is to merely update the state Xj of a site j at one time and to do this from the ﬁrst site to the last site along a pipeline structure. This procedure deﬁnes a single cycle of an iterative algorithm for estimating x , and this is the main idea of the algorithm referred to as iterated conditional modes (ICM) [23]. Within the tth iteration, a ðtÞ

plausible choice of the updated cluster label xj of a site j is just the cluster ID l 2 L which can minimize the local posterior energy function, given the soil properties at site j and the current local conﬁguration:

H. Wang et al. / Structural Safety 56 (2015) 18–29

21

Fig. 1. ICM-EM algorithm for clustering based on HMRF-FGM model where the convergence criteria, deﬁned as the relative difference between two successive values of the indicator variable (i.e. UðxÞ=TðtÞ þ UðyjxÞ or Q-function value), fall below a small threshold.

1 T 1 1 ðtÞ ðtÞ ðtÞ ðtÞ ðtÞ xj ¼ arg min yj ll Rl yj ll þ log jRl j 2 2 l2L X ½b dðl ~xi Þ=TðtÞ þ

ð19Þ

i2@ j

where ~xi is the current conﬁguration of the neighbors of site j. For ðtÞ ðt1Þ . i < j, ~xi ¼ xi ; for i > j, ~xi ¼ xi In the framework of ICM, the local conditional probability Pðxj jx@j Þ depends on a global control parameter T (as was deﬁned in Eq. (10)). At low temperatures, the local conditional probability tends to strengthen the spatial constraint, whereas at high temperature the random ﬁeld is less constrained (i.e., the ﬁeld tends to be position-independent). A wise strategy is to set a high temperature on the previous cycles so that the label cannot be ﬁxed too early; on the basis of an unreliable initial estimation, then, the temperature T decreases slowly. This process is also called annealing. In the tth iteration, the temperature satisﬁes the following expression [22]:

TðtÞ P

c logð1 þ tÞ

ð20Þ

where c is a constant independent of loop index t; in the context of our problem, c ¼ b. Though it requires a little more iteration, the result is more robust than when using the scheme that does not consider temperature. For simplicity and without losing rationality, we use the boundary value, which means:

b ¼ logð1 þ tÞ TðtÞ

ð21Þ

For practical purposes, the temperature must not keep decreasing without control. If it does, it will result in a single cluster solution, which is meaningless for our problem. Hence, a maximum number of iterative cycles t for annealing should be set in advance. Then, Eq. (19) becomes: ðtÞ

1 T 1 1 ðtÞ ðtÞ ðtÞ ðtÞ yj ll Rl yj ll þ log jRl j 2 2 l2L X dðl ~xi Þ logð1 þ ~tÞ þ

xj ¼ arg min

ð22Þ

i2@ j

t; t 6 t . In general, a relatively weaker p(x) should be t ; t > t used in ICM [23] and in our problem; a value of t ¼ 7 seems to be appropriate because, after seven iterative loops, the decrement of the local posterior energy is no longer signiﬁcant. After each cycle of iteration, we estimate the updated total posterior energy. This strategy can ﬁnd the minimum of the total posterior energy very quickly, and it will guarantee convergence [23]. where ~t ¼

2.4. ICM-EM clustering algorithm In the HMRF-FGM model, the class label of each site Xj and the parameters of each cluster hl ¼ ðll ; Rl Þ are interdependent. At the very beginning, both the site conﬁguration and the distribution parameters are unknown. The data set is said to be ‘‘incomplete’’ and the clustering problem is regarded as an ‘‘incomplete-data’’ problem [25].

22

H. Wang et al. / Structural Safety 56 (2015) 18–29 Synthetic configuration

(a)

-150

3 2 1

0

-200 -250 -300

200 300 Defect ID

400

-400 0

500

x 10

3 2 1

-350

100

5

5 4

Resistivity ( Ω .cm)

Half-cell potential (Eh)

4 Cluster ID

(c)

(b) -100

100

200 300 Defect ID

400

500

0 0

100

200 300 Defect ID

400

500

Fig. 2. Simulation result of a grouping conﬁguration. (a) In this example, there are 500 corrosion defects, which are classiﬁed into four clusters; for illustration purposes, two realizations from the emission distributions are shown: (b) half-cell potential; (c) soil resistivity.

The ICM-EM algorithm is based on the HMRF-FGM model. The ICM-EM algorithm that is employed to solve the ‘‘incomplete-data’’ problem is a modiﬁcation of conventional EM

Second, in the M-step, the updated estimates of parameters Hðtþ1Þ are calculated by the maximization of the conditional expectation function which is referred to as Q-function based on the posterior

ðt1Þ ðt1Þ ; x2 ; ðt1Þ ðt1Þ ðtÞ ðtÞ ðtÞ ðtÞ x3 ; . . . ; xs Þ and the current estimation of ¼ ðh1 ; h2 ; h3 ; ðtÞ . . . ; hl Þ, this strategy mainly contains two steps within a loop of ðt1Þ ðtÞ

probability PðtÞ ðljyj Þ and current HðtÞ :

algorithm. Given the previous estimation of xðt1Þ ¼ ðx1

H

iteration: ﬁrst, in the E-step, x

is updated to x

Q¼

by employing

ICM and taking HðtÞ and the observed soil properties y as inputs. Having a current conﬁguration of xðtÞ , the posterior probability PðtÞ ðljyj Þ is evaluated for all sites, and it has the following expression:

ðtÞ PðtÞ ljx@ j f yj jhl PðtÞ ðljyj Þ ¼ P ðtÞ ðtÞ ljx@ j f yj jhl l2L P

Original Classification

3 Cluster ID

Cluster ID

ð24Þ

λ = 1 Classification

(b)

3 2 1

2 1

0

100

200 300 Defect ID

400

500

λ = 4 Classification

(c)

0

200 300 Defect ID

400

500

400

500

λ = 7 Classification

3 Cluster ID

2

2 1

1

0

100

(d)

3 Cluster ID

1 Pðljyj Þ log P ljx@ j log jRl j 2 j2S l2L 1 1 T logð2 y l p Þ yj ll R1 j l l 2 2

Actually, we perform this maximization approximately under the conditional independent assumption. Initial estimates are generated by employing a conventional EM algorithm. In the E-step, we ignore the spatial constraint and merely estimate Pðljyj Þ at each site j independently. A detailed description of the EM algorithm is available in the literature [20].

ð23Þ

(a)

X X

100

200 300 Defect ID

400

0

500

100

200 300 Defect ID

(e) 0.08 MCR

0.07 0.06 0.05 0.04 0.03 0

2

4 λ

6

8

Fig. 3. Classiﬁcation results for three different neighborhood systems using ICM-EM algorithm. (a) Original classiﬁcation; (b) k ¼ 1; (c) k ¼ 4; (d) k ¼ 7; (e) effect of k on the accuracy of clustering results.

23

H. Wang et al. / Structural Safety 56 (2015) 18–29 Table 1a The original parameters of the emission distributions. C1a

Ehb Resistivity pH c [CO2 3 ] [HCO ] 3 [Cl] 2 [SO4 ] a b c

C2

C3

Mean

Variance

Mean

Variance

Mean

Variance

2.68E+02 8.81E+03 5.21E+00 1.00E+00 2.57E+00 2.72E+00 6.67E02

3.08E+03 2.56E+07 3.90E01 6.61E01 3.06E+00 7.01E01 1.65E03

2.23E+02 1.97E+03 6.11E+00 1.06E+00 3.10E+00 3.75E+00 1.10E01

2.76E+03 1.64E+06 3.46E01 4.60E01 2.25E+00 8.01E01 4.50E03

2.97E+02 5.86E+04 4.30E+00 1.35E01 2.03E+00 2.11E+00 5.25E02

1.88E+03 3.19E+09 6.29E01 8.58E02 1.18E+00 2.04E01 3.44E04

C = cluster. Eh = half-cell potential. [.] = ion concentration.

Table 1b The estimated parameters of the emission distributions. C1

Eh Resistivity pH [CO2 3 ] [HCO 3] [Cl] [SO2 4 ]

C2

C3

Mean

Variance

Mean

Variance

Mean

Variance

2.65E+02 8.20E+03 5.25E+00 1.12E+00 2.74E+00 2.77E+00 6.91E02

3.29E+03 2.67E+07 3.71E01 6.61E01 2.98E+00 6.68E01 1.69E03

2.24E+02 2.08E+03 6.16E+00 1.17E+00 3.03E+00 3.69E+00 1.01E01

2.76E+03 1.72E+06 3.37E01 4.60E01 2.94E+00 7.94E01 3.10E03

2.97E+02 6.17E+04 4.38E+00 9.76E02 1.87E+00 2.09E+00 5.05E02

2.06E+03 3.31E+09 6.45E01 1.14E01 1.12E+00 1.62E01 3.54E04

When the convergence criteria are satisﬁed, the initial conﬁguration x0 is determined by Eq. (25)

xj ¼ arg maxfPðljyj Þg

ð25Þ

l2L

of observations without considering spatial constraint. We employ a conventional FGM model to ﬁnd the best number of components, since it avoids performing ICM within each iterative loop and thus is faster to trial many cases with a different number of components. We will provide further illustration in Section 4.

Without loss of generality, the ICM-EM algorithm for estimating the actual conﬁguration x and the parameters of each component distribution is shown in Fig. 1.

3. Numerical experiment

2.5. The number of components

3.1. Simulation of hidden Markov random ﬁeld

A proper number of components can cause the data points to have high similarities within each cluster and low inter-similarities between clusters within the feature space [20,26]. Several studies have been performed on model selection, including the Akaike information criterion (AIC), the Bayesian information criteria (BIC) [28], the heterogeneity test [16], and a Bayesian approach combined with Monte Carlo integration [31]. The BIC considers the dimension of input space by including the number of features as a penalty [28,29], and thus the BIC is more sensitive to model complexity. We used the BIC in the present study in order to determine the number of components. The BIC has the form:

Numerical simulation is often used to verify stochastic models. In the context of assessing soil corrosivity, we consider there are several different groups of corrosion defects, and the set of all external corrosion defects within a pipeline interval is a Markov random ﬁeld pðxÞ with the state set L ¼ f1; 2; 3; . . . ; lg. Each defect with its corresponding soil features indicates the local soil corrosivity. Simulation is used to generate a synthetic grouping conﬁguration x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ and we consider this as the original conﬁguration that is used for comparison with the estimated result from the HMRF-FGM model. Here, a Gibbs sampler proposed by Geman and Geman [22] is applied. Based on the generated realization x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ of pðxÞ, at each site, the observation yi (i.e. a set of soil properties) is sampled from the conditional Gaussian emission distribution f ðyj jhl ; xj ¼ lÞ. Seven well-known critical soil properties related to corrosion are considered in this study: half-cell potential, soil resistivity, pH, and the concentrations of 2 four different ions (CO2 3 , HCO3 , Cl , SO4 ). All the distribution parameters (i.e. means and covariance structures) related to the seven properties are estimated from a data set extracted from a real-world survey along a pipeline interval. Fig. 2 shows an example of a synthetic conﬁguration and its emission realization (i.e. the simulation results of soil properties), in which the random ﬁeld has four different clusters. In a real-world problem, the emission realization y of random ﬁeld pðyÞ is obtained by soil survey at in-situ sites, and we want to reveal the hidden state sequence x ¼ ðx1 ; x2 x3 ; . . . ; xs Þ .

BIC ¼ 2loglikeðx; hÞ M logðnÞ

ð26Þ

where loglike(x, h) is the maximized log-likelihood, M is the number of independent parameters to be estimated, and n is the number of data points. Previous studies recommend choosing the model at the maximum BIC [28,29,32,33]; however, the number of components chosen could be too large or too small, and this could lead to overﬁtting or to misclassiﬁcation. Fraley and Raftery [28] suggested selecting the number of components where the BICs for all covariance models seem to converge. For the present study, we chose the optimal number of clusters at the point where the increment of the BIC begins to slow down. Because spatial correlation is not reﬂected in the feature space, we are able to choose a number that can maximize the likelihood

24

H. Wang et al. / Structural Safety 56 (2015) 18–29 Original classification

(e) 0.25

(a)

MCR

Cluster ID

HMRF-FGM FGM

0.2

4 3 2

0.15 0.1 0.05

1

0 0

100

200 300 Zone ID

400

500

0

Classification result by HMRF-FGM Σ =0.5 Σ 0

Classification result by FGM Σ =0.5 Σ 0

(b)

4 Cluster ID

Cluster ID

4 3 2 1

3 2 1

0

100

200 300 Zone ID

400

500

0

100

Classification result by FGM Σ =4 Σ 0

(c)

400

500

0

4

4

3

3

2 1

2 1

0

100

200 300 Zone ID

400

500

0

100

400

500

0

4 Cluster ID

4 3 2

3 2 1

1

0

200 300 Zone ID

Classification result by HMRF-FGM Σ =20 Σ

Classification result by FGM Σ =20 Σ 0

(d)

CLuster ID

200 300 Zone ID

Classification result by HMRF-FGM Σ =4 Σ

Cluster ID

CLuster ID

5 10 15 20 Times of covariance matrix Σ 0

100

200 300 Zone ID

400

500

0

100

200 300 Zone ID

400

500

Fig. 4. Classiﬁcation results via FGM and HMRF-FGM for three different covariance structures, where R0 is the reference covariance structure: (a) original classiﬁcation; (b) R ¼ 0:5R0 ; (c) R ¼ 4R0 ; (d) R ¼ 20R0 ; (e) effect of covariance structure on the accuracy of clustering results.

3.2. The effect of the order k on clustering result In implementing the HMRF-FGM model, it is important to choose a proper neighborhood system. In the context of a one-dimensional Markov random ﬁeld, it is the parameter k that determines the size of neighborhoods. To measure the quality of a classiﬁcation, we deﬁne the misclassiﬁcation ratio (MCR) in the form:

number of misclassification sites MCR ¼ total number of sites

ð27Þ

Cluster analyses with k ranging from 1 to 7 are performed. Fig. 3 shows some of the classiﬁcation results and the k MCR relationship. It worth noting that in Fig. 3, a relatively small k can preserve more ‘‘details’’ of the original classiﬁcation, which means some special sites that are singular compared with their neighbors can be detected and reﬂected in the estimated classiﬁcation. However, as k increases, the spatial constraint is strengthened, which results in

a more consistent classiﬁcation, hence some singular sites may be misclassiﬁed. Based on the result illustrated in Fig. 3(e), a good choice of k would be smaller than 4; in this paper, k ¼ 2 is adopted. The original and estimated parameters of emission distributions when k ¼ 2 are shown in Tables 1a and 1b. The estimated results are acceptable when compared to the original parameters. 3.3. The robustness of ICM-EM algorithm Another issue we confront when we employ a clustering method is that the classiﬁcation quality may deteriorate as the variance of observations increases and becomes comparable with the difference of means of different clusters. Some clustering approaches may be sensitive to the scatter extent of the data points in feature space under some conditions [25,28]; these approaches include k-means clustering, FGM modeling, and density-based clustering techniques. One possible reason for this sensitivity is that these approaches lack a consideration of spatial constraint.

25

H. Wang et al. / Structural Safety 56 (2015) 18–29

(a) 20 2005 MFL 2010 UT

Density

15

10

5

0 0

0.1

0.2

0.3 0.4 Defect depth (%WT)

0.5

0.6

(b) 0.7 0.6

2005 MFL 2010 UT

This zone is unable to implement ILI inspection

Defect depth (%WT)

0.5 0.4

0.3 0.2

0.1 Only MFL is conducted in this zone 0 0

2

4

6 Mileage (m)

8

10 4

x 10

Fig. 5. Measured defect depth from ILI reports of 2005 and 2010: (a) Density histograms of the defect depth extracted from the ILI reports of two different years; (b) Scatters of the reported metal loss along the pipeline interval.

Because the HMRF-FGM model applies spatial constraint to each site via a neighborhood system, the ICM-EM algorithm is more robust than conventional clustering methods. To demonstrate this advantage of the ICM-EM algorithm, a stochastic simulation was performed using a given synthetic state conﬁguration but different covariance structures of emission functions. In this numerical experiment, the reference covariance structure R was estimated based on the real-world survey data mentioned above. Seven different covariance structures are generated by multiplying a constant to the reference covariance structure. Some classiﬁcation results via FGM and HMRF-FGM are shown in Fig. 4. FGM model results are easily mixed together when the variances of random variables increase, as can be seen in Fig. 4(a)–(d). However, the HMRF-FGM model is able to hold the spatial conﬁguration well when the statistical separability deteriorates. Fig. 4(e) shows a comparison of MCR between the results via FGM and HMRF-FGM, and it is obvious that HMRF-FGM is relatively more robust than the ﬁnite Gaussian mixture model.

4. Example 4.1. General information As an example, we apply the HMRF-FGM model to a real-life underground pipeline system. The pipeline was originally installed in 1969. The total length of the inspected interval is about 110 km. The diameter is 457.2 mm, and the wall thickness is 6.41 mm. Two in-line inspections (ILI) were conducted in 2005 and 2010 via two

different technologies: magnetic ﬂow leakage (MFL) in 2005 and ultrasonic testing (UT) in 2010. For some reason, ILI inspection was not implemented on a sub-interval within the pipeline system in 2005 and 2010. The inspection report includes the location, type and size information for the defects. A total of 1132 external defects were reported in 2005, while 1931 external defects were reported in 2010. The histograms of defect depth and location information are shown in Fig. 5, in which %WT represents the percentage of wall thickness – this ranges from 0 (for a wall that is completely intact) to 1 (for a wall that has failed). It is interesting to note that the histogram of 2005 data seems following the exponential distribution, however, it worth noting that there is no defect depth detected less than 10% of wall thickness. This is because the detectability of the MFL inspection device is limited; when the defect depth is smaller than a threshold level, the probability of detection is so low that a shallow corrosion defect is nearly impossible to detect [34]. Hence, the data from 2005 are truncated, and the exponential assumption may not be true. The data from 2010 are incomplete, as some segments were not inspected in 2010. A soil survey was conducted at every 200 m along the right-of-way of the pipeline in 2012, and we refer to the survey points as stations. Seven soil properties (i.e. half-cell potential, 1 resistivity, pH, and the concentrations of CO2 3 , HCO3 , Cl , and SO2 ) and some other soil properties were either measured or esti4 mated. After performing correlation analysis, some properties that were highly dependent on other properties were excluded. Only the seven soil properties listed above are taken into consideration in the analysis. Three critical soil properties (i.e. resistivity, pH and

26

H. Wang et al. / Structural Safety 56 (2015) 18–29 5

Resistivity ( Ω .cm)

3.5

x 10

3 2.5 2 1.5 1 0.5 0 8 7

pH

6 5 4 3 2

[Cl-] ( mmol / L)

10 8 6 4 2 0

0

2

4

6

8

Mileage (m)

10 4

x 10

Fig. 6. Three critical soil properties along the right-of-way of the pipeline structure.

1.85

x 10

4

Test Test Test Test Test

1.8 1.75

BIC

1.7

set set set set set

1 2 3 4 5

1.65 1.6 1.55 1.5 1.45 0

5

10 15 Number of components

20

Fig. 7. The number of components vs. BIC for the covariance structure of diagonal type with varying volume and shape for ﬁve test sets.

concentration of Cl) are shown in Fig. 6 for the purpose of illustrating the heterogeneity of the soil environment along the pipeline structure. 4.2. Clustering procedure In the clustering procedure, only soil properties and location information are used. Each data point represents a corrosion defect associated with the seven soil properties, which were estimated by

taking the average of the survey data from the two nearest stations. When using the FGM model to determine the number of components, the data are assumed to be generated from k mixed multivariate Gaussian distributions with different parameters. We begin our trials from k = 2 to k = 20 using ﬁve different test sets. Most of the test sets show a clear concave shape. The minimum BIC of different test sets are shown in Fig. 7. Test sets 1, 2 and 5 achieved their minimum BIC at k = 5 for the diagonal covariance model having varying volumes and shapes. This implies that ﬁve clusters could be most preferable for our data. Therefore, we choose k = 5 as the number of components. Both FGM and HMRF-FGM models were applied to our data set. The classiﬁcation results for 2005 and 2010 are shown in Fig. 8. As we expect, the external corrosion propagation is heterogeneous along the pipeline and can be clustered into ﬁve groups. The FGM model is not able to reﬂect the spatial constraint, so the classiﬁcation is mixed together in physical space (i.e. the one-dimensional space along the pipeline structure). However, the segments become very clear once the HMRF-FGM model is applied. We prefer to use the HMRF-FGM classiﬁcation results for the soil corrosivity assessment because the soil properties usually cannot ﬂuctuate rapidly within a small area, and hence the defects are believed to be spatially correlated. The centers (i.e. the means) of soil properties in feature space corresponding to different clusters are shown in Table 2.

27

H. Wang et al. / Structural Safety 56 (2015) 18–29

(a)

FGM 2005

HMRF-FGM 2005

5

5

4

4

Cluster ID

Cluster ID

6

3 2

200

400 600 Defect ID

800

0

1000

200

FGM 2010

(b) 5

5

4

4

3 2

800

1000

3 2 1

1 0

400 600 Defect ID

HMRF-FGM 2010

Cluster ID

Cluster ID

2 1

1 0 0

3

500

1000 Defect ID

1500

0

500

1000 Defect ID

1500

Fig. 8. Comparison of clustering results via FGM and HMRF-FGM: (a) 2005 data; (b) 2010 data. Defects are numbered consecutively from the start station to the end station along the inspected pipeline interval.

4.3. Corrosivity assessment The measured corrosion depth by ILI is considered as the indicator of soil corrosivity. Within each cluster, the soil properties are assumed to be homogeneous. The histograms of metal loss for different clusters are depicted in Fig. 9. It can be seen from this ﬁgure that the distribution of metal loss evolves with time. Since 2005 data are truncated and we have no information for wall thicknesses that are less than 0.1 (%WT), the evolution of the corrosion propagation in each cluster may not be illustrated well. However, the variance of the defect depth can be related to the corrosion rate. This is because, within a certain cluster, different defects will have a different starting time and hence will have different histories of corrosion propagation. A faster corrosion rate may lead to a deeper depth for older defects, while new defects keep being generated with shallow pits. Cluster 2 in Fig. 9 implies this phenomenon very well. In a moderate soil environment, a segment will show less difference in corrosion propagation between old defects and young ones, as can be noticed in Cluster 3 and Cluster 4 in Fig. 9. For comparison of the soil corrosivity between any two clusters, we adopt the median as the typical number because it is more robust in the current context. Fig. 10 illustrates the estimations of medians for ﬁve clusters. The 95% conﬁdence intervals are represented by error bars, which are obtained by employing a bootstrap methodology. Comparison of the width of the conﬁdence intervals implies that the uncertainty of the median will vary along the pipeline. This is partly because the size of different clusters shows a large amount of variation. A smaller cluster size usually leads to a higher uncertainty of estimators (e.g. Cluster 4 from

2005 data). Fig. 10 indicates that Cluster 2 has the highest corrosion rate; Clusters 3 and 4 may have equivalent corrosion rates. Similarly, Clusters 1 and 5 may have equivalent corrosion rates, and these are higher than the rates in Clusters 3 and 4. It is worth noting that although the soil properties are different between clusters, it is possible that the corrosion propagations are similar because of the complexity of the corrosion process under in-situ conditions.

5. Discussions 5.1. Clustering-based inspection strategy Based on the clustering result and the corrosivity assessment, different inspection strategies can be applied to different segments of a pipeline structure. The concept is that for the segments with high corrosion rate, routine (or more frequent) inspections should be conducted to obtain more information about the evolution of corrosion depth as well as to locate and replace the failed segments. For the segments with moderate corrosion propagation, a relatively longer interval between inspections could be adopted. In practice, the rational optimal inspection scheme should be based on a stochastic model of corrosion growth [15,35] and system reliability analysis [36], which is outside the scope of this study. The clustering approach can provide categorized data that can be further processed with model calibration and system reliability analysis, so that the estimated reliability for current and future conditions is time- and position-dependent. An optimized inspection scheme can then be formed according to the tolerance of risk.

Table 2 The center of soil properties for the ﬁve clusters.

Eh Resistivity pH [CO2 3 ] [HCO 3] [Cl ] 2 [SO4 ]

C1

C2

C3

C4

C5

1.61E+02 3.96E+03 6.24E+00 7.52E01 3.48E+00 4.78E+00 1.78E01

2.38E+02 6.18E+03 5.89E+00 1.80E+00 2.76E+00 3.85E+00 1.33E01

3.17E+02 6.66E+03 4.97E+00 6.15E01 1.64E+00 2.37E+00 6.32E02

3.38E+02 1.74E+04 5.17E+00 1.18E+00 2.57E+00 2.55E+00 6.24E02

2.85E+02 5.12E+04 4.59E+00 5.75E01 2.70E+00 2.36E+00 6.40E02

28

H. Wang et al. / Structural Safety 56 (2015) 18–29

(a)

(b) C1-2005 MFL C1-2010 UT

15 10

0 0

0 0

0.6

15 10 5

0.2 0.4 Defect depth (%WT)

0 0

0.6

0.2 0.4 Defect depth (%WT)

0.6

(e)

(d)

20 Density

20

C5-2005 MFL C5-2010 UT

25

C4-2005 MFL C4-2010 UT

25 Density

20

10 5

C3-2005 MFL C3-2010 UT

25

15

5 0.2 0.4 Defect depth (%WT)

(c)

20 Density

Density

20

C2-2005 MFL C2-2010 UT

25

Density

25

15 10

15 10 5

5 0 0

0.2 0.4 Defect depth (%WT)

0.6

0 0

0.2 0.4 Defect depth (%WT)

0.6

(f) 0.7 This zone is unable to implement ILI inspection

C1-2005 MFL C2-2005 MFL C3-2005 MFL C4-2005 MFL C5-2005 MFL C1-2010 UT C2-2010 UT C3-2010 UT C4-2010 UT C5-2010 UT

0.6

Defect depth (%WT)

0.5

0.4

0.3

0.2

0.1 Only MFL is conducted in this zone 0 0

2

4

6 Mileage (m)

8

10 x 10

4

Fig. 9. Clustering result of HMRF-FGM: Density histograms of metal loss for (a) Cluster 1; (b) Cluster 2; (c) Cluster 3; (d) Cluster 4; (e) Cluster 5; and scatter plots of the reported metal loss along the pipeline interval. Note: ‘‘C1-2005 MFL’’ means Cluster 1 of 2005 data using MFL technology.

0.26

2005 data 2010 data

Defect depth (%WT)

0.24 0.22 0.2 0.18 0.16 0.14 0.12 0.1

C1

C2

C3 Cluster ID

C4

C5

Fig. 10. Estimation results of 95% conﬁdence intervals of medians for the ﬁve clusters.

5.2. The detectability and error of ILI device The MFL data are truncated at metal loss = 0.1 due to the detection limit of the MFL device, and this results in missing data for the inspection report. Although very shallow defects are trivial for

integrity assessment, information about shallow defects is essential for analyzing the evolution of defect growth. Moreover, the truncated data may lead us to obtain a biased estimation of the probability density function of the corrosion depth. Similarly, UT equipment also suffers from intrinsic measurement errors [37]. While UT is more sensitive to shallow defects due to the physical principle applied to it, UT equipment cannot measure deep defects accurately since the second echo is difﬁcult to detect for a very thin remaining wall [38]. In addition, it is known that ILI data are affected by systematic and random errors [39]. Therefore, a more realistic assessment of external corrosion should be based on corrected inspection data. This aspect is beyond the scope of the present paper, but it cannot be ignored. 5.3. The density of defects While Cluster 2 has the highest corrosion rate, a higher number of corrosion defects are detected in Cluster 3 and Cluster 4. The topography for the location of the studied pipeline interval – which ranges in elevation from 48 m to 2498 m – seems to be responsible for the frequency of the defects instead of the level of soil corrosivity because of (1) initial damage during construction and (2) the structure-soil interaction. During the installation of the pipe and

H. Wang et al. / Structural Safety 56 (2015) 18–29

during the service period, the coating may easily become damaged by the rock and gravel under the pipe in mountainous terrain (from the 58 km point along the pipeline to the 110 km point). About 65% of the defects at pipeline stations between 58 km and 110 km are located at the bottom surface of the pipeline, but only about 30% of defects occur at the bottom for the remaining stations. This makes the geotechnical factors plausible as the reason for the high frequency of detected defects. Therefore, the density of defects at a certain location does not directly reﬂect the level of corrosivity of the soil. 6. Conclusion In this study, a clustering approach based on a hidden Markov random ﬁeld is established for assessing external corrosion in a buried pipeline. The combination of hidden Markov random ﬁeld theory and a ﬁnite mixture model can apply spatial constraint to the soil survey data, resulting in a classiﬁcation that is more realistic and easier to interpret in terms of spatial correlation. The clustering analysis can aid in revealing the underlying pattern of the soil properties and categorizing the corrosion defects by searching homogeneous segments from a heterogeneous random ﬁeld of soil properties. The results from the numerical simulation addressed the observation that different neighborhood systems can imply different spatial constraint to each site and, usually, a relatively weaker ﬁeld (i.e. pðxÞ with small k) is preferred since it is more ﬂexible than a stronger ﬁeld. In addition, the HMRF-FGM model is found to be more robust than the conventional FGM model; hence the estimation result for the HMRF-FGM model should be more reliable. To illustrate the application of the approach, it was applied to a real-life pipeline structure. The variation of corrosion propagation along the pipeline interval is related to the heterogeneous soil environment along the right-of-way of the pipeline. The categorized data reveals that different soil environments may lead to similar corrosion propagation, which implies the difﬁculty of assessing soil corrosivity levels barely using the in-situ soil survey information. The integration of soil survey data and in-line inspection data is proved to be a better scheme, and this scheme can be integrated into a clustering-based inspection strategy that can improve and guide the maintenance practice. Acknowledgements The authors would like to acknowledge the Secretaría de Energía (SENER) and the Consejo Nacional de Ciencia y Tecnología (CONACYT) in Mexico for the ﬁnancial support to perform the work in this project (under SENER-CONACYT contract number 159913). References [1] Abes JA, Salinas JJ, Rogers J. Risk assessment methodology for pipeline systems. Struct Saf 1985;2:225–37. [2] Engelhardt GR, Macdonald DD. Uniﬁcation of the deterministic and statistical approaches for predicting localized corrosion damage. I. Theoretical foundation. Corros Sci 2004;46:2755–80. [3] Romanoff M. Underground corrosion. US Government Printing Ofﬁce; 1957. [4] Shibata T. 1996 WR Whitney Award Lecture: statistical and stochastic approaches to localized corrosion. Corrosion 1996;52:813–30. [5] Velazquez JC, Caleyo F, Valor A, Hallen JM. Predictive model for pitting corrosion in buried oil and gas pipelines. Corrosion 2009;65:332–42. [6] Provan J, Rodriguez III E. Part I: Development of a Markov description of pitting corrosion. Corrosion 1989;45:178–92.

29

[7] Melchers RE. Representation of uncertainty in maximum depth of marine corrosion pits. Struct Saf 2005;27:322–34. [8] Kamrunnahar M, Urquidi-Macdonald M. Data mining of experimental corrosion data using Neural Network. ECS Trans 2006;1:71–9. [9] Sinha SK, Pandey MD. Probabilistic neural network for reliability assessment of oil and gas pipelines. Comput Aided Civil Infrastruct Eng 2002;17:320–9. [10] Melchers RE. Progression of pitting corrosion and structural reliability of welded steel pipelines. Oil and Gas Pipelines. Integrity Saf Handb 2015:327–42. [11] Frankel G. Pitting corrosion of metals: a review of the critical factors. J Electrochem Soc 1998;145:2186–98. [12] Ferreira CAM, Ponciano JA, Vaitsman DS, Pérez DV. Evaluation of the corrosivity of the soil through its chemical composition. Sci Total Environ 2007;388:250–5. [13] Hong H. Inspection and maintenance planning of pipeline under external corrosion considering generation of new defects. Struct Saf 1999;21:203–22. [14] Alamilla JL, Sosa E. Stochastic modelling of corrosion damage propagation in active sites from ﬁeld inspection data. Corros Sci 2008;50:1811–9. [15] Zhang S, Zhou W, Qin H. Inverse Gaussian process-based corrosion growth model for energy pipelines considering the sizing error in inspection data. Corros Sci 2013;73:309–20. [16] Hong H, Ye W. Estimating extreme wind speed based on regional frequency analysis. Struct Saf 2014;47:67–77. [17] Pinto J, Blockley D, Woodman N. The risk of vulnerable failure. Struct Saf 2002;24:107–22. [18] Agarwal J, Blockley D, Woodman N. Vulnerability of structural systems. Struct Saf 2003;25:263–86. [19] Wang H, Yajima A, Liang R, Castaneda H. Bayesian modeling of external corrosion in underground pipelines based on the integration of Markov chain Monte Carlo techniques and clustered inspection data. Comput Aided Civil Infrastruct Eng 2014;30:300–16. [20] McLachlan G, Peel D. Finite mixture models. John Wiley & Sons; 2004. [21] Chatzis SP, Varvarigou TA. A fuzzy clustering approach toward hidden Markov random ﬁeld models for enhanced spatially constrained image segmentation. IEEE Trans Fuzzy Syst 2008;16:1351–61. [22] Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 1984;PAMI-6:721–41. [23] Besag J. On the statistical analysis of dirty pictures. J Roy Stat Soc 1986;48:259–302. [24] Li SZ. Markov random ﬁeld modeling in computer vision. New York: SpringerVerlag, Inc.; 1995. [25] Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden Markov random ﬁeld model and the expectation-maximization algorithm. IEEE Trans Med Imaging 2001;20:45–57. [26] Celeux G, Govaert G. Gaussian parsimonious clustering models. Pattern Recogn 1995;28:781–93. [27] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 1977;39:1–38. [28] Fraley C, Raftery AE. How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 1998;41:578–88. [29] Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 2002;97:611–31. [30] Qian W, Titterington D. Multidimensional Markov chain models for image textures. J Roy Stat Soc: Ser B (Methodol) 1991:661–74. [31] Yuen KV, Mu HQ. Peak ground acceleration estimation by linear and nonlinear models with reduced order Monte Carlo simulation. Comput Aided Civil Infrastruct Eng 2011;26:30–47. [32] McLachlan GJ, Basford KE. Mixture models. Inference and applications to clustering. Statistics: Textbooks and Monographs. New York: Dekker; 1988. 1988; 1. [33] McLachlan GJ, Krishnan T. The EM algorithm and extensions. WileyInterscience; 2007. [34] Rodriguez III E, Provan J. Part II: development of a general failure control system for estimating the reliability of deteriorating structures. Corrosion 1989;45:193–206. [35] Gomes WJ, Beck AT. Optimal inspection and design of onshore pipelines under external corrosion process. Struct Saf 2014;47:48–58. [36] Zhou W. System reliability of corroding pipelines. Int J Press Vessels Pip 2010;87:587–95. [37] Wang H, Yajima A, Liang RY, Castaneda H. A Bayesian model framework for calibrating ultrasonic in-line inspection data and estimating actual external corrosion depth in buried pipeline utilizing a clustering technique. Struct Saf 2015;54:19–31. [38] Goedecke H. Ultrasonic or MFL inspection: which technology is better for you. Pipeline Gas J 2003;230:34–41. [39] Caleyo F, Alfonso L, Espina-Hernández J, Hallen J. Criteria for performance assessment and calibration of in-line inspections of oil and gas pipelines. Meas Sci Technol 2007;18:1787–99.

A clustering approach for assessing external corrosion in a buried pipeline based on hidden Markov random field model

A clustering approach for assessing external corrosion in a buried pipeline based on hidden Markov random field model

Recommend Documents