One-dimensional pairwise CNN for the global alignment of two DNA sequences

Neurocomputing 149 (2015) 505–514 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom One-dim...

Download PDF

703KB Sizes 0 Downloads 3 Views

Report

PDF Reader
Full Text

Neurocomputing 149 (2015) 505–514

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

One-dimensional pairwise CNN for the global alignment of two DNA sequences Luping Ji n, Xiaorong Pu, Hong Qu, Guisong Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, PR China

art ic l e i nf o

a b s t r a c t

Article history: Received 17 September 2013 Received in revised form 30 July 2014 Accepted 7 August 2014 Communicated by R.W. Newcomb Available online 23 August 2014

The cellular neural network (CNN) is one of the classic artiﬁcial neural networks. During the past decades, the one-dimensional CNN models and their applications have not yet been paid enough enthusiasm too. For this reason, this paper proposes a simpliﬁed one-dimensional CNN model and then designs a pairwise network using this model to demonstrate its applicability. This pairwise CNN consists of two parallel one-dimensional CNNs, a ﬁxed master and a movable slave. Using this pairwise CNN, an algorithm is developed to perform the global alignment of two DNA sequences. In this algorithm, the slave moves forward step by step, and the cell states of the master are computed in the meanwhile. Based on all the states obtained in all time steps, a state selection array is generated then a global alignment path is determined from this array. Under the guidance of the alignment path, two DNA sequences are globally aligned by inserting blank spaces in the appropriate positions of these two sequences. Experiments on aligning the DNA sequences from the publicly available databases of the NCBI with this method are carried out in this paper and compared with the other two methods. Through evaluating computation time and similarity, these experiments prove that the proposed onedimensional CNN model is effective, and the alignment algorithm based on a pairwise CNN of the model is efﬁcient, obtaining higher similarity with less computation time than the other two. & 2014 Elsevier B.V. All rights reserved.

Keywords: Cellular neural network DNA sequences State selection array Global alignment Similarity computation

1. Introduction As one of the classic neural network models, the cellular neural network (CNN) was originally proposed by Chua and Yang [1,2]. From then on, CNNs have been attracting a great deal of enthusiasm of researchers in the world. During the past decades, most research work on CNNs focused on two main ﬁelds. One is on CNN theories, such as the stability analysis [3–7], the convergence analysis [8], the chaos [9,10] and the improved CNN models [11–15]; another is on CNN applications, such as the hardware implementations [16–18], the image processing [19–21], the pattern recognition [22,23] and the associative memories [24–26]. Like the other neural networks, CNNs have also all kinds of possible models with different dimensions. The publications during the past decades clearly show that the research on the twodimensional CNNs has attracted the overwhelming attentions of the researchers. As a result, obviously inadequate research efforts have been made on either studying some other CNN models such as the one-dimensional CNN [27], the three-dimensional CNN and the CNN even in higher dimensions or continuously exploring the potential applications of these models.

n

Corresponding author.

http://dx.doi.org/10.1016/j.neucom.2014.08.023 0925-2312/& 2014 Elsevier B.V. All rights reserved.

In general, the conventional two-dimensional CNN [1,2] model can be mathematically expressed in the following differential forms with the following rules of changing cell sates and outputs at time t: 8 dx ðtÞ 1 > > < C ij ¼ xij ðtÞ þ ∑ Akl ykl ðtÞ þ ∑ Bkl ukl þ I ij ; Rx dt kl A Nij ðrÞ kl A Nij ðrÞ > > : y ðtÞ ¼ f ðx ðtÞÞ; ij

ð1Þ

ij

where both the subscripts ‘ij’ and ‘kl’ represent the location coordinates of the cells in the CNN, Nij(r) represents a coordinate set, which consists of the position coordinates of all cells in the neighbor domain of Cði; jÞ, xij(t) represents the state of cell Cði; jÞ at time t, yij(t) denotes the cell output of Cði; jÞ at time t, and ukl, independent of the time t, is the external link input of the neighbor cell Cðk; lÞ in the neighbor domain of Cði; jÞ. Moreover, C and Rx are two constants, and Iij is an independent bias constant for cell Cði; jÞ. In addition, A and B are the feedback template and the control template, respectively. Correspondingly, Akl and Bkl are respectively the feedback coefﬁcient and the control coefﬁcient with the coordinate ’kl’ in the templates A and B for cell Cði; jÞ. In Eq. (1), the constraint conditions for the CNN include jxij ð0Þj r 1, juij j r 1, C 4 0, and Rx 4 0. Furthermore, the activation

506

L. Ji et al. / Neurocomputing 149 (2015) 505–514

function f(x) for the cell output is deﬁned as the following form: 1 f ðxij ðtÞÞ ¼ ðxij ðtÞ þ1 xij ðtÞ 1Þ 2 8 1; xij ðtÞ Z 1; > < ¼ xij ðtÞ; ∣xij ðtÞ∣o 1; > : 1; xij ðtÞ r 1:

ð2Þ

Besides those classic two-dimensional CNNs, some onedimensional CNN models have also been proposed, such as the discrete-time model of Manganaro et al. [14], and the two-layer model [27] by Takahashi et al. Moreover, the sufﬁcient conditions of the one-dimensional CNN for detecting the connected component are also analyzed by Takahashi et al. [28]. These onedimensional models exhibit many novel characteristics, and may be potentially applied to some engineering ﬁelds such as sequence alignment. However, almost no further research work was carried out since then. By now, the research on the one-dimensional CNNs has almost fallen into silence completely. During the past decades, as one of the most important research problems in bioinformatics [29], the similarity computation of DNA sequences has attracted many eye-sights of researchers in the world. As we know, DNA carrying the genetic secrets of livings consists of only the four different nucleotide bases: A, C, G and T. It is highly believed that, by similarity comparison on the DNA sequences obtained from different biological samples, their biological structure characteristics can be predicted in advance, their evolutionary paths can be traced, and their biological functions can also been accurately identiﬁed. In general, to complete the similarity computation of two DNA sequences, a global alignment process is extremely critical and even inevitably necessary. So far, many classic alignment algorithms have been developed to do the global alignment of two DNA sequences, such as the dot pot [30], the dynamic programming [29,31], and the heuristic algorithms such as FASTA [32] and BLAST [33]. Moreover, as another classic problem, the alignment of multiple DNA sequences is still a very difﬁcult NP problem, and it has not been well-solved yet though there have been some efﬁcient algorithms proposed to do it, for instance, the iteration algorithm in [34], the progressive alignment in [35], and the graph theory approach in [36]. Not considering the global alignment of multiple DNA sequences, this paper only addresses the global alignment problem of two DNA sequences. During the past decades, although CNNs have been widely used in some application ﬁelds such as the image processing [19], they have not been applied to the similarity computation of DNA sequences yet. This paper proposes a new one-dimensional CNN model differing from those traditional two-dimensional models, then designs a one-dimensional pairwise CNN using the proposed model. Based on the one-dimensional pairwise CNN, this paper also develops a global alignment and similarity computation algorithm for two DNA sequences. Signiﬁcantly differing from the conventional two-dimensional CNN models with row-column cell structure, the proposed onedimensional CNN model consists of only one cell chain, in which each cell only has two neighbor cells at the most at any time: a right one and a left one. Furthermore, the pairwise CNN consists of two parallel one-dimensional CNNs, a master sub-network and a slave sub-network. The master is ﬁxed, while the slave is movable. As running the pairwise CNN, the master sub-network always keeps immobile and the slave regularly moves forward a ﬁxed distance (the critical distance) at each step along the direction parallel to the master. Using the pairwise one-dimensional CNN, an alignment method is developed for two DNA sequences. First, it generates a state selection array according to the dynamic states and outputs of the

cells in the master at all time steps. Then, an alignment path is obtained by back-tracing the states in the array. Finally, under the guidance of the alignment path, two sequences are globally aligned by inserting some blank spaces at the appropriate positions. Moreover, in order to evaluate this algorithm, some numerical experiments on the DNA sequences from the publicly available NCBI databases have been conducted in this paper. Two typical criteria including similarity and computation time are adopted to numerically evaluate the algorithm performances in these experiments. The remainder of this paper is organized as follows. In Section 2, a simpliﬁed one-dimensional CNN structure model is proposed. Section 3 designs a one-dimensional pairwise CNN and develops an algorithm for the global alignment of two DNA sequences and similarity computation using the designed pairwise CNN. In order to evaluate the model and the algorithms, Section 4 shows a few numerical experiments on the NCBI databases, and analyzes the experiment results. Finally, Section 5 concludes this paper.

2. The proposed one-dimensional CNN model The conventional two-dimensional CNN models usually consist of a typical cell array with n rows and m columns, and each cell in the network is mutually connected to its neighbor cells [1,2]. Differing from the two-dimensional models, the proposed one-dimensional model is only made up of a linear cell chain with m cells which are arranged in a single row. Thus, any cell C(i), iA f0; 1; …; m 1g, in the one-dimensional network has only one direct link to each of its two neighbor cells, Cði 1Þ and Cðiþ 1Þ. Fig. 1(a) and (b) exhibits the brief network structure of the model with a linear cell arrangement and the single cell diagram, respectively. As shown in Fig. 1, xi ; ui and yi represent the cell state signal, the link input signal, and the cell output signal of the given cell C (i), respectively. f ðxi Þ is a modulation function formulated for the cell output, which is usually deﬁned as a piecewise-linear function. Moreover, A is a feedback template, which is usually used to modulate all the feedback inputs from the output signals of cell C (i) and its neighbors. Similarly, B represents a control template, which is usually used to modulate all the link-input signals from cell C(i) and its neighbors. Furthermore, both C and Rx are constant circuit parameters, which are usually determined empirically. Ii is the cell bias constant (or threshold), and xi ð0Þ indicates the initial cell state of C(i) at the time step t ¼0. Similar to the two-dimensional models, Eq. (1), the dynamic state of cell C(i) in the one-dimensional CNN can be brieﬂy

Fig. 1. Illustration of the proposed one-dimensional CNN model. (a) Linear chain network structure; (b) Single cell diagram.

L. Ji et al. / Neurocomputing 149 (2015) 505–514

described in the following forms: 8 dx ðtÞ 1 > : y ðtÞ ¼ f ðx ðtÞ; I Þ i

i

ð3Þ

i

1

if xi ðtÞ ¼ I i ;

0

else:

ð4Þ

Differing from Eq. (2), this function is not a piecewise-linear function. In fact, from the output characteristic's point of view, Eq. (4) is a typical two-valued function with a pulse output of the cell in the CNN. It means that if xi(t) reaches the threshold Ii at time t, the cell C(i) outputs 1, yi ðtÞ ¼ 1, otherwise no pulse outputs, yi ðtÞ ¼ 0. Moreover, of course, Eq. (3) still should follow both the two constraints jxi ð0Þj r1 and jui j r1. As shown in Fig. 1, the proposed one-dimensional CNN consists of a simple cell chain, so it has a much simpler cell neighbor domain than a two-dimensional CNN. Fig. 2 gives out a typical structure of the cell neighbor domain in the proposed onedimensional model. In the one-dimensional CNN above, cell C(i) has only the two links to its two neighbor cells, Cði 1Þ and Cði þ 1Þ. So the neighbor domain of C(i) consists of only the three cells Cði 1Þ, Cði þ 1Þ and itself, namely N i ðrÞ ¼ fi 1; i; i þ1g. Furthermore, C(i) just receives the two feedback inputs Ai 1 yi 1 and Ai þ 1 yi þ 1 from the outputs of its two neighbor cells Cði 1Þ and Cði þ 1Þ, respectively. These two feedback inputs are modulated by the feedback template A. At the same time, C(i) also receives the two cell link inputs Bi 1 ui 1 and Bi þ 1 ui þ 1 from its neighbors Cði 1Þ and Cði þ1Þ, respectively. These two cell link inputs are modulated by the control template B. In general, cell C(i) also receives the two extra signals from itself, the self-feedback input Ai yi and the self-link input Bi ui . For most of the time, Eq. (3) is usually used for depicting the state dynamics of the cells in a continuous-time CNN. However, in some special applications such as digital image processing, a discrete-time CNN is more suitable. From Eq. (3), the discrete dynamics of the one-dimensional model can be approximately deduced as follows: C

xi ðt þ ΔtÞ xi ðtÞ 1 ¼ xi ðtÞ þ ∑ Ak yk ðtÞ þ ∑ Bk Rx Δt k A N i ðrÞ k A N i ðrÞ uk þ I i

ð5Þ

Ai −1 i -2

ui −1

Bi × ui Bi −1

In Eq. (6), t is used to represent the discrete time step, t ¼ 0; 1; 2; 3; … . Moreover, all the other symbols or expressions in Eq. (6) have the same meanings as those deﬁned in Eq. (3).

3. The global alignment of two DNA sequences using the CNN Usually, a DNA sequence consists of only the four nucleotide bases: adenine, guanine, cytosine and thymine. In bioinformatics, these four nucleotide bases are usually abridged as the corresponding characters ‘A’, ‘G’, ‘C’ and ‘T’, respectively. In nature, a species has its own distinctive DNA sequences with an exclusive arrangement structure of nucleotide bases. It is the differences of the category, quantity and arrangement order of the nucleotide bases in DNA sequences that makes the DNA sequences different from each other. Moreover, the DNA sequence is also believed as the most direct and essential reason of the biological diversity in nature. Since a DNA sequence consists of only the four fundamental nucleotide bases, any given sequence can be equivalently transformed into a pure character sequence of fA; G; C; Tg by converting all the nucleotide bases using the corresponding characters to them. So, the global alignment of two DNA sequences can be equivalently transformed into the global alignment of two character sequences. The following content will show how to globally align a pair of DNA sequences using the one-dimensional CNN model, and how to evaluate the alignment performance. 3.1. The designed one-dimensional pairwise CNN As described in Section 2, the one-dimensional CNN model has a linear cell arrangement structure, as shown in Fig. 1(a). Each cell has only two link cells at the most, as illustrated in Fig. 1(b). These characteristics of the model can be effectively utilized to develop an alignment algorithm for two DNA sequences. In order to develop such an algorithm, ﬁrst a pairwise CNN is constructed using the one-dimensional model proposed in Section 2. The designed pairwise CNN is different from those traditional CNNs because it only consists of two separate one-dimensional CNNs, as shown in Fig. 3. Fig. 3 exhibits the pairwise network at the initial state, t ¼0. Structurally, this pairwise CNN consists of two separate onedimensional CNNs, the master sub-network CNN1 with m cells and the slave sub-network CNN2 with n cells. Furthermore, all cells of CNN1 are ﬁxed and represented as C 1 ðiÞ, i ¼ 0; 1; …; m 1, while the cells of CNN2 are movable and represented as C 2 ðjÞ, j ¼ 0; 1; …; n 1. From the point of view of the interrelation, as shown in Fig. 3, CNN2 is separate from CNN1 and also in parallel to CNN1. Moreover, the distance between two adjacent cells in a same CNN and the distance between CNN1 and CNN2 are ﬁxed as the critical distance r, same as that in Eq. (6). In addition, CNN1

Ai +1

yi −1 i -1

If let Δt ¼ 1, Eq. (5) is transformed into Eq. (6), which is a discrete model of the one-dimensional CNN: ! 1 1 xi ðtÞ þ ∑ Ak yk ðtÞ þ ∑ Bk uk þI i : xi ðt þ1Þ ¼ xi ðtÞ þ C Rx k A Ni ðrÞ k A N i ðrÞ ð6Þ

where t indicates the time step, and k represents the position coordinate of a cell in the one-dimensional CNN. Ni(r) represents the coordinate set of the neighbor cells of C(i), and r is deﬁned as the critical distance, beyond which any cell cannot have any link to its neighbors. Ak and Bk are two matrix coefﬁcients from the templates A and B, respectively. yk and uk are the outputs from the neighbor cell C(k) and the link input to C(k), respectively. Moreover, at time t, as shown in Eq. (4), the output modulation function f ðÞ is redeﬁned as yi ðtÞ ¼ f ðxi ðtÞ; I i Þ ¼

507

i

Ai × yi

CNN1 (master, fixed)

yi +1 i +1

0

r

i +2

ui +1 Bi +1

Fig. 2. The typical neighbor domain structure of the cell C(i).

n-1 …

1

r

1

…

i

… m-1

r

0

CNN 2 (slave, movable)

Fig. 3. The structure of one-dimensional pairwise CNN, at t¼ 0.

508

L. Ji et al. / Neurocomputing 149 (2015) 505–514

always keeps ﬁxed at any time step t, t ¼ 0; 1; …, while CNN2 regularly moves forward step by step along the direction parallel to CNN1 and moves forward a ﬁxed distance r at each step. Moreover, CNN1 is deﬁned as the work center (the master) and the dynamics of CNN2 are ignored in the pairwise CNN. As a result, CNN2 only plays a role of the link supplier (the slave) to CNN1 in the running of this pairwise CNN. As shown in Fig. 3, at time t¼0, no cell in CNN1 can link to CNN2 because the distance between C 1 ðiÞ and C 2 ðjÞ is bigger than the critical distance r. Since CNN2 moves forward r at each time step, at t¼1 cell C 2 ð0Þ exactly moves to the vertical location under C 1 ð0Þ. From then on, at each time step t, CNN2 will continue to move forward r. So at time t¼ m, C 2 ð0Þ exactly moves to the vertical location under C 1 ðm 1Þ, and at t ¼ m þ n 1 C 2 ðn 1Þ arrives in the location right under C 1 ðm 1Þ. Finally, at t ¼ m þ n, CNN2 completely divorces from CNN1, and no cell in CNN1 can link to CNN2, so the pairwise CNN stops at that time step. In order to clearly show the movement process of CNN2 in the running of the pairwise CNN, Fig. 4 illustrates the three critical transition positions of CNN2 moving forward respectively at the time steps t¼1, t¼m and t ¼ m þ n 1. In this ﬁgure, the arrows represent the links of a cell in CNN1 to its neighbor cells. Moreover, the cell of CNN1 can have the three typical neighbor domains, as shown in Fig. 5. In this ﬁgure, (a) represents the neighbor domain of C 1 ð0Þ, which consists of three cells: C 1 ð0Þ, C 1 ð1Þ and C 2 ð0Þ, and occurs at t¼ 1; (b) represents the neighbor domain of C 1 ðiÞ, 1 o i o m 1, which consists of the four cells: C 1 ðiÞ, C 1 ði 1Þ, C 1 ði þ 1Þ and C 2 ðjÞ, and occurs at 1 o t o m þ n 1; and (c) represents the neighbor domain of C 1 ðm 1Þ, which consists of

CNN1

n-1 …

1

1

0

CNN2

0

1

…

i

n-1 …

1

0

…

…

0

i

… m-1

i

… m-1 CNN1

1

0

CNN2

1

0

CNN2

Fig. 4. Three transition positions of CNN2: (a) at t¼ 1; (b) at t¼ m; and (c) at t ¼ m þ n 1.

u, y

u, y

0

1

i-1

u, y

u, y i

u, y

u, y 0

u, y

j

i +1

m-2

u, y

8 > 1 1 > > > < x1;i ðt þ 1Þ ¼ 1 CRx x1;i ðtÞ þ C > > > > : y1;i ðtÞ ¼ f ðx1;i ðtÞ; I 1;i Þ ¼

(

1 0

!

∑

Ak yl;k ðtÞ þ

C l ðkÞ A N 1;i ðr;tÞ

∑

Bk ul;k þ I 1;i

C l ðkÞ A N1;i ðr;tÞ

if x1;i ðtÞ ¼ I 1;i ; else;

ð7Þ where t is the time step, Cl(k) is the neighbor cell of C 1 ðiÞ, x1;i ðtÞ and y1;i ðtÞ represent the state and the output of C 1 ðiÞ at time t, respectively. yl;k ðtÞ and ul;k (independent of t) are the output and the link input to Cl(k) at time t, respectively. Moreover, deriving from Eq. (4), f ðx1;i ðtÞ; I 1;i Þ is still a two-valued function with the pulse output characteristic, thereinto I 1;i is designed as the pulse threshold for deciding the output of C 1 ðiÞ. In application, ul;k usually comes from the initialization of the CNN, and always keeps its value without any change all the time. For instance, it is initialized using the grey-scale values of the pixels in the image processing [23]. Similarly, in the alignment algorithm of this paper it will be initialized using the numerical codes of the DNA sequences. The remaining symbols in Eq. (7) have the same meanings as in Eq. (3). Because CNN2 moves forward a ﬁxed distance r at each step, its neighbor domain will change with time t. So in Eq. (7), N1;i ðr; tÞ is still used to indicate the present neighbor domain of C 1 ðiÞ at time t. Moreover, as derived from Eq. (3), besides jx1;i ð0Þj r 1 and ju1;i j r 1, Eq. (7) should also constrain ju2;j j r 1, considering the link input of CNN2 available to CNN1. 3.2. The alignment using the one-dimensional pairwise CNN

… m-1 CNN1

n-1 …

the three ones: C 1 ðm 1Þ, C 1 ðm 2Þ and C 2 ðn 1Þ, and occurs at t ¼ m þ n 1. Furthermore, in the right neighbor domain as shown in Fig. 5(a), C 1 ð0Þ receives the two link inputs and the two feedback inputs from itself and C 1 ð1Þ, and the link input from C 2 ð0Þ, ignoring the feedback from the output of C 2 ð0Þ. Similarly, as shown in Fig. 5(b), C 1 ðiÞ simultaneously receives the three link inputs and the three feedback inputs from itself, C 1 ði 1Þ and C 1 ði þ 1Þ, and the link input from C 2 ðjÞ. As shown in Fig. 5(c), C 1 ðm 1Þ receives the two link inputs and the two feedback inputs from itself and C 1 ðm 2Þ, and the link input from C 2 ðn 1Þ. For the pairwise CNN, since CNN1 works as the work center and CNN2 is regarded as the link supplier to CNN1, the state and output of CNN1 at any time step t 4 0 can be derived from Eqs. (3) and (4) as

u, y m-1

u, y n-1

Fig. 5. Three typical neighbor domains: (a) right neighbor domain; (b) double neighbor domain; and (c) left neighbor domain.

In practical applications, two DNA sequences to be aligned often have the different DNA sequence lengths. Therefore, a key step for a global alignment process is to generate an alignment path, then under the guidance of the path, blank spaces are inserted into the two sequences step by step. Finally, the two sequences with different lengths will be expanded into two new sequences of the same lengths [37,38], which, of course, include all those inserted blank spaces. By this alignment, similarity computation between two DNA sequences can be conducted. The algorithm proposed in this paper consists of three main steps, as detailed in Fig. 6. First, it initializes a one-dimensional pairwise CNN using the two original DNA sequences. Second, it generates a global alignment path based on the state selection array of the CNN. Last, it conducts a global alignment under the guidance of the path generated in the previous step. For clarity, the following example with two short DNA sequence fragments is used to present the detailed computation steps of this algorithm. This example contains the two DNA sequence fragments, S1 ¼ fA; A; G; C; T; C; T; Gg and S2 ¼ fC; A; G; C; A; Tg. Their sequence lengths are LenðS1 Þ ¼ 8 and LenðS2 Þ ¼ 6, respectively. Use S1 ðiÞ, i ¼ 0; 1; 2…, LenðS1 Þ 1, and S2 ðjÞ, j ¼ 0; 1; 2…, LenðS2 Þ 1, to respectively represent the ith and jth characters of S1 and S2. For the sake of numerical calculations, the ﬁve basic characters fn; A; C; G; Tg are

L. Ji et al. / Neurocomputing 149 (2015) 505–514

correspondingly quantiﬁed as f0; 1; 0:5; 0:5; 1g (which meets the constraints ju1;i j r 1 and ju2;j j r1), where ‘n’ represents a blank space.

where u1;i and u2;j represent the link inputs of C 1 ðiÞ and C 2 ðjÞ, respectively. After the initialization, the pairwise CNN with the initial state (namely at t¼0) is obtained, as exhibited in Fig. 7. Then, set these parameters C ¼1, Rx ¼1, I 1;i ¼ 0, the feedback template A ¼ f0; 0; 0; 0g, the control template B ¼ f0; 1; 0; 1g, x1;i ð0Þ ¼ 0, x2;j ð0Þ ¼ 0, and the selection factor template is set as F ¼ fF 0 ; F 1 ; F 2 g ¼ f5; 3; 2g, where the three prime numbers must be different from each other and follow F 0 4 F 1 4F 2 .

3.2.1. The initialization of the CNN On these ﬁve numerical characters, the two DNA sequences S1 and S2 are numerically transformed into: S01 ¼ f 1; 1; 0:5; 0:5; 1; 0:5; 1; 0:5g and S02 ¼ f 0:5; 1; 0:5; 0:5; 1; 1g. First, CNN1 is initialized as the master network with m ¼ Len ðS1 Þ þ 1 ¼ 9 cells, and CNN2 is initialized as the slave network with n ¼ LenðS2 Þ þ1 ¼ 7 cells. The link input u1;i of cell C 1 ðiÞ, i ¼ 0; 1; …; 8, and u2;j of cell C 2 ðjÞ, j ¼ 0; 1; …; 6, are respectively initialized using the numerical codes of S1 and S2, as follows: 8 ( > > S01 ði 1Þ if 1 r i r m 1; > > > u ¼ > 1;i > 0 if i ¼ 0; < ( 0 > > > u ¼ S2 ðn 2 jÞ if 0 r j r n 2; > 2;j > > 0 ifj ¼ n 1; > :

509

3.2.2. The generation of the global alignment path First, the state selection function φðÞ is deﬁned for the cell state C 1;i ðtÞ, 1 r t r 15, where φðÞ follows: ( x1;i ðtÞ if i ¼ 0 and t ¼ 1; ð9Þ x1;i ðtÞ ¼ φðx1;i ðtÞÞ ¼ maxðδ0 ; δ1 ; δ2 Þ else; where the three parameters δ1, δ2 and δ3 follow 8 > < δ0 ¼ x1;i 1 ðt 1Þ F 2 ; δ1 ¼ x1;i 1 ðt 2Þ þ y1;i ðtÞ F 0 ð1 y1;i ðtÞÞ F 1 ; > : δ ¼ x ðt 1Þ F :

ð8Þ

2

ð10Þ

2

1;i

Moreover, if the states of the cell selected for δ0, δ1 and δ2 do not exist at some time steps, these states will be ignored. For example, for cell C 1 ð0Þ at t ¼2, because i 1 o0, neither state x1;i 1 ðt 2Þ nor x1;i 1 ðt 1Þ exists. Therefore at this time step, both x1;i 1 ðt 2Þ and x1;i 1 ðt 1Þ are ignored. As a result, both δ0 and δ1, which directly depend on x1;i 1 ðt 2Þ and x1;i 1 ðt 1Þ, are also ignored. Fig. 8 exhibits the full steps of running this pairwise CNN. In these steps, as mentioned above, those cells without any link to CNN2 have been ignored. According to Eqs. (7) and (9), the full computation results of CNN1 at time step t, where 1 r t r 16, are listed as (1) at t¼ 1, y1;0 ð1Þ ¼ 1, x1;0 ð1Þ ¼ 0; (2) at t¼ 2, y1;0 ð2Þ ¼ 0, x1;0 ð2Þ ¼ 2; y1;1 ð2Þ ¼ 0, x1;1 ð2Þ ¼ 2; (3) at t¼3, y1;0 ð3Þ ¼ 0, x1;0 ð3Þ ¼ 4; y1;1 ð3Þ ¼ 0, x1;1 ð3Þ ¼ 3; y1;2 ð3Þ ¼ 0, x1;2 ð3Þ ¼ 3; (4) at t ¼4, y1;0 ð4Þ ¼ 0, x1;0 ð4Þ ¼ 6; y1;1 ð4Þ ¼ 1, x1;1 ð4Þ ¼ 3; y1;2 ð4Þ ¼ 0, x1;2 ð4Þ ¼ 5; y1;3 ð4Þ ¼ 0, x1;3 ð4Þ ¼ 6; (5) at t ¼5, y1;0 ð5Þ ¼ 0, x1;0 ð5Þ ¼ 8; y1;1 ð5Þ ¼ 0, x1;1 ð5Þ ¼ 1; y1;2 ð5Þ ¼ 1, x1;2 ð5Þ ¼ 2; y1;3 ð5Þ ¼ 0, x1;3 ð5Þ ¼ 7; y1;3 ð5Þ ¼ 0, x1;3 ð5Þ ¼ 8; ⋮ (14) at t¼ 14, y1;7 ð14Þ ¼ 1, x1;7 ð14Þ ¼ 12; y1;8 ð14Þ ¼ 0, x1;8 ð14Þ ¼ 3; (15) at t¼ 15, y1;8 ð15Þ ¼ 0, x1;8 ð15Þ ¼ 10; (16) at t¼ 16, state computation is completed. According to the movement characteristics of CNN2, since the last step in this example comes at t¼ 16, the CNN stops at that time, as shown in step (16). In general, for the designed pairwise CNN with m cells in CNN1 and n cells in CNN2, both y1;i ðtÞ and x1;i ðtÞ at each time step t are computed. When the CNN stops at t ¼ m þ n, a series of the cell selection states, x1;i ðtÞ, i ¼ 0; 1; …; m 1 and t ¼ 1; 2; …; m þ n 1,

Fig. 6. Flow chart of the alignment algorithm.

u= CNN2

u=

0

1

2

3

4

5

6

1

-1

-0.5

0.5

-1

-0.5

0

0

-1

-1

0.5

CNN1 -0.5

1

-0.5

1

0.5

0

1

2

3

4

5

6

7

8

Fig. 7. The initialization results of the pairwise CNN, at time t¼ 0.

510

L. Ji et al. / Neurocomputing 149 (2015) 505–514

u1 = 0 Fixed, CNN1

CNN2

-1

-1

0.5

-0.5

1

-0.5

1

0.5

0

1

2

3

4

5

6

7

8

t=1

0

1

2

3

4

5

6

u2 = 1

-1

-0.5

0.5

-1

-0.5

0

0

1

2

3

4

5

6

u2 = 1

-1

-0.5

0.5

-1

-0.5

0

0

1

2

3

4

5

6

u2 = 1

-1

-0.5

0.5

-1

-0.5

0

0

1

2

3

4

5

6

u2 = 1

-1

-0.5

0.5

-1

-0.5

0

CNN2

CNN2

CNN2

CNN2

t=2

0

1

2

3

4

5

6

-1

-0.5

0.5

-1

-0.5

0

t=5

…

…

0

t=4

u2 = 1 …

u1 = 0 Fixed, CNN1

t=3

-1

-1

0.5

-0.5

1

-0.5

1

0.5

1

2

3

4

5

6

7

8

CNN2

0

1

2

3

4

u2 = 1

-1

-0.5

0.5

-1

t=15 t=15 5 -0.5

6 0

Fig. 8. The cell state and output computation steps of the pairwise CNN, at t ¼ 1; 2; …; 15.

have been obtained step by step. After that, the selection array R nm with n rows and m columns using all x1;i ðtÞ, is constructed as 2

x1;0 ð1Þ

6 x1;0 ð2Þ 6 6 x1;0 ð3Þ 6 6 nm R ¼ 6 x1;0 ð4Þ 6 ⋮ 6 6 4 x1;0 ðn 1Þ x1;0 ðnÞ

x1;1 ð2Þ x1;1 ð3Þ x1;1 ð4Þ x1;1 ð5Þ ⋮ x1;1 ðnÞ x1;1 ðn þ 1Þ

3

x1;2 ð3Þ x1;2 ð4Þ

⋯ ⋯

x1;m 2 ðm 1Þ x1;m 2 ðmÞ

x1;m 1 ðmÞ x1;m 1 ðmþ 1Þ 7

x1;2 ð5Þ x1;2 ð6Þ

⋯ ⋯

x1;m 2 ðm þ1Þ x1;m 2 ðm þ2Þ

x1;m 1 ðmþ 2Þ x1;m 1 ðmþ 3Þ

x1;2 ðn þ2Þ

⋯

x1;m 2 ðm þn 2Þ

x1;m 1 ðmþ n 1Þ

7 7 7 7 7 7 ⋮ ⋱ ⋮ ⋮ 7 7 x1;2 ðn þ1Þ ⋯ x1;m 2 ðm þn 3Þ x1;m 1 ðmþ n 2Þ 5

ð11Þ Therefore, based on Eq. (11), all these cell selection states x1;i ðtÞ, where i ¼ 0; 1; …; 8 and t ¼ 1; 2; …; 15, obtained in the previous step are used to construct the selection array R 79 for CNN1 as 2 3 0 2 4 6 8 10 12 14 16 6 2 3 5 7 3 5 5 7 9 7 6 7 6 7 6 4 7 3 2 0 2 4 6 8 10 6 7 6 7 79 1 0 7 5 3 1 1 3 7 R ¼ 6 6 6 7 6 8 1 2 5 12 10 8 6 4 7 6 7 6 7 4 3 10 9 7 5 3 5 4 10 3 12 5 2 1 8 15 13 12 10 After having obtained the selection array, the global alignment path is to be determined. Deﬁne a node set P as the alignment path, where the initial P ¼ ∅. Moreover, use Rðn0 ; m0 Þ to represent the element at coordinate ðn0 ; m0 Þ of the array Rnm . By backtracing from the last state x1;m 1 ðm þn 1Þ to the ﬁrst one x1;0 ð1Þ of the array R nm step by step, the global alignment path can be determined as the following steps: 1. Let P ¼ fðn 1; m 1Þg, and N ¼ n 1; M ¼ m 1. 2. While N a 0 and M a 0, do the following loops: Find the maximum of RðN; M 1Þ, RðN 1; M 1Þ and RðN 1; MÞ. If any one of the three does not exist, ignore it. Save coordinates of the maximum into ðN 0 ; M 0 Þ. P ¼ fðN 0 ; M 0 Þg⋃P.

Update N with N 0 , and M with M 0 . 3. Obtain the path node set P. 4. The generation of alignment path is completed. As shown above, by back-tracing in the selection array R nm , we can obtain the global alignment path: P ¼ fð0; 0Þ; ð1; 0Þ; ð2; 1Þ; ð2; 2Þ; ð3; 3Þ; ð4; 4Þ; ð5; 4Þ; ð6; 5Þ; ð6; 6Þ; ð6; 7Þ; ð6; 8Þg. By now, the key global alignment path P for the two DNA sequences has been achieved. 3.2.3. The global alignment of two DNA sequences This part will exhibit how to conduct the global alignment of two original DNA sequences under the guidance of the global alignment path node set P. First, deﬁne P(i), i ¼ 0; 1; 2; …, to represent the ith element of the coordinate set P (namely, the path node set), and use lP to represent the size of P. The full alignment process is described as follows: 1. Let integer I ¼0, PðIÞ ¼ ðV I ; H I Þ. 2. While I olP 1, do the following loops: Compare PðI þ1Þ with P(I). case (i): if V I þ 1 V I ¼ 1 and H I þ 1 H I ¼ 0, a ‘n’ is inserted into S1 at the location of S1 ðIÞ; case (ii): if V I þ 1 V I ¼ 1 and H I þ 1 H I ¼ 1, no action; case (iii): if V I þ 1 V I ¼ 0 and H I þ 1 H I ¼ 1, a ‘n’ is inserted into S2 at the location of S2 ðIÞ; case (iv): else, no action. Update I with I ¼ I þ 1; 3. The global alignment is completed. Using the alignment process described above, the two DNA sequences, S1 and S2, in this example have been globally aligned. Fig. 9 shows the ﬁnal alignment results. In this ﬁgure, the vertical

L. Ji et al. / Neurocomputing 149 (2015) 505–514

511

Fig. 9. An example of the global alignment: (a) two original sequences; (b) two sequences with alignment labels.

Table 1 Time consumption of the CNN algorithm to complete the alignment (ms). S2

S1 S62051 Len ¼ 226

S62051 Len ¼226 NM_008134 Len ¼625 NM_010422 Len ¼1922 NM_000405 Len ¼3690 NG_009059 Len ¼24 343 NG_009301 Len ¼42 028

NM_008134 Len ¼ 625

NM_010422 Len ¼ 1922

NG_009059 Len ¼ 24 343

NG_009301 Len ¼42 028

72.23

76.15

80.12

140.27

446.75

731.19

76.15

78.63

91.46

152.69

466.78

786.61

80.12

91.46

124.66

225.29

531.13

852.82

140.27

152.69

225.29

280.12

615.75

998.58

446.75

466.78

531.13

615.75

N/A

N/A

731.19

786.61

852.82

998.58

N/A

N/A

line ‘j’ between two nucleotide bases indicates one labelled couple of the nucleotide bases which have been aligned exactly. Moreover, ‘n’ still represents a blank space which has been inserted into these two sequences. 3.2.4. The numerical evaluation on the alignment For the DNA sequences in the example above, both the lengths LenðS1 Þ and LenðS2 Þ are quite short, so it is easy to evaluate the alignment results by directly checking these sequences with the alignment labels. However, a majority of DNA sequences are very long, so checking the sequence alignment labels is no longer an effective method to evaluate the performances of the alignment algorithm. Another important and frequently used method is to calculate the global similarity between two DNA sequences [29,37,38]. In general, on a same sequence pair, a good alignment algorithm often generates a high numerical similarity, while a bad one often results in a low similarity. In this paper, we deﬁne SCðS1 ; S2 Þ as the numerical evaluation criterion of the global similarity between S1 and S2: SCðS1 ; S2 Þ ¼

NM_000405 Len ¼ 3690

2 N Label 100% LenðS1 Þ þ LenðS2 Þ

ð12Þ

where NLabel indicates the number of these nucleotide bases with alignment labels, LenðS1 Þ and LenðS2 Þ still represent the lengths of the original sequences S1 and S2, respectively. Furthermore, for the example shown in Fig. 9, it shows that LenðS1 Þ ¼ 8, LenðS2 Þ ¼ 6 and N label ¼ 4. Therefore, SCðS1 ; S2 Þ ¼ ð2 4Þ=ð8 þ 6Þ 100% 57:14%, according to Eq. (12). Generally, those classic alignment algorithms for DNA sequences, such as the dot pot [30], the dynamic programming [29,31], the heuristic algorithms such as FASTA [32] and BLAST [33], could achieve good alignment results but often have high time complexity. For these classic algorithms mentioned above, their time complexity is represented as OðTÞ ¼ Oðm nÞ, which implies that the computation time cost is in proportion to the product of two sequence lengths. However, the alignment algorithm using a one-dimensional pairwise CNN can generate the selection array of the cells just at time

t ¼ m þ n þ1 because of the parallel computation of the neural networks. So the time complexity O(T) of the algorithm in this paper is OðTÞ ¼ Oðm; nÞ Oðm þ n þ 1Þ, which is approximately in proportion to the total length of the two DNA sequences to be aligned.

4. Experimental evaluations In order to evaluate the proposed method and compare it with the others, the following experiments have been done. In these experiments, all these alignment approaches are simulated using MATLAB software (version: R2011a), and tested on the same computer with 3.2 GHz dual-core CPU, 2 GB RAM and Windows XP operating system. All the DNA sequences in these experiments are taken randomly from the publicly available INSDC Tay-Sachs disease section of the NCBI, U.S.A. This section contains forty-seven different nucleotide sequences. In general, both the global similarity and the computation time cost are often taken as the criteria to evaluate a DNA alignment algorithm [29,37,38]. This paper also uses these criteria to assess the outperformance of the proposed CNN algorithm over the others. 4.1. Algorithm testing results In these experiments on the sequences from the Tay-Sachs section, the computation time and similarity values with this developed CNN algorithm are summarized in Tables 1 and 2, respectively. In these two tables, Len represents the length of a sequence, and ‘N/A’ indicates that the corresponding alignment has not been successfully executed due to the insufﬁcient memory of the computer. Table 1 clearly shows that the computation time of this CNN algorithm gradually increases with the increase of the total length (approximately, its computation time follows Oðm þ nÞ). The longer the total length of the two sequences is, the more the computation time it takes. When the total length of the two sequences reached to the maximum which the experiment computer can handle, this

512

L. Ji et al. / Neurocomputing 149 (2015) 505–514

Table 2 Global similarity obtained with the CNN algorithm (%). S2

S1 S62051 Len ¼226

S62051 Len ¼ 226 NM_008134 Len ¼ 625 NM_010422 Len ¼ 1922 NM_000405 Len ¼ 3690 NG_009059 Len ¼ 24 343 NG_009301 Len ¼ 42 028

100 39.95

NM_008134 Len ¼ 625 39.95 100

NM_010422 Len ¼1922

NM_000405 Len ¼ 3690

NG_009059 Len ¼24 343

NG_009301 Len ¼ 42 028

17.04

8.53

1.40

0.82

35.81

20.35

3.52

2.25

50.11

10.57

6.50

25.15

11.68

17.04

35.81

100

8.53

21.88

50.11

1.40

3.52

10.57

25.15

N/A

N/A

0.82

2.25

6.50

11.68

N/A

N/A

100

Table 3 Computation time comparisons of the three algorithms (ms).

MILP SPA The proposed

Pairs S62051 NM_008134

NM_308134 NM_000405

NG_009059 NM_008134

NG_009059 NM_010422

NM_010422 NG_009301

NG_009301 NM_000405

80.91 96.42 76.15

237.88 262.56 152.69

595.71 756.45 466.78

735.91 920.13 531.13

1223.67 1560.22 852.82

2212.34 3043.74 998.58

CNN algorithm failed to align the sequences. This does not mean that this CNN algorithm does not work for the long sequences but just implies that a computer with a sufﬁcient memory can facilitate this new CNN algorithm especially in case of aligning the two long DNA sequences. Table 2 exhibits the experiment results in terms of the global similarity deﬁned by Eq. (12). These experiment results show that this CNN algorithm always obtains the precise similarity values, such as 35.81% between NM_008134 and NM_010422, and always ensures the global similarity of 100% between the sequence and itself (e.g. S62051 vs. itself). This algorithm exactly behaves as what are described before. 4.2. Algorithm comparisons In order to compare this CNN algorithm with the others, more experiments with the DNA sequences from NCBI have been executed and compared to each other. These two algorithms used for comparisons include the mixed-integer linear optimization (abbreviated as MILP, 2007) algorithm [37] and the super pairwise alignment (abbreviated as SPA, 2002) algorithm [38]. Just as mentioned in the publications [29,37,38], the computation time and the numerical similarity are still taken as the two criteria to compare these algorithms. In order to test and compare these three algorithms, a number of nucleotide sequence pairs with various lengths are randomly chosen from the NCBI databases and tested one by one using these three alignment algorithms. Table 3 summarizes and compares their experiment results. The unit for the computation time in Table 3 is millisecond (ms). These sequence pairs in this table cover the sequences containing 400–50 000 nucleotide bases. The shortest total length of two sequences is 851 (S62051 vs. NM_008134), and the longest

3500

The algorithm computation time, ms

Methods

3000

MILP SPA the proposed

2500

2000

1500

1000

500

0 102

103

104

105

The sum of two sequence lengths, m+n Fig. 10. The computation time curves of these three algorithms on different sequence lengths.

total length is 45718 (NG_009301 vs. NM_000405). Note that no sequence with more than 50 000 nucleotide bases has been tested in this paper due to the insufﬁcient computer memory. In terms of the computation time, as shown in Table 3, the algorithm proposed in this paper is the fastest one for both the shortest sequence pair S62051 vs. NM_008314 and the longest sequence pair NG_009301 vs. NM_000405. It took only 71.16 ms with the proposed CNN algorithm to complete the alignment of S62051 vs. NM_008314. It is 20.27 ms faster than SPA, and 4.76 ms faster than MILP. For those long sequence pairs, such as NG_009301

L. Ji et al. / Neurocomputing 149 (2015) 505–514

513

Table 4 The Global similarity comparisons (%). Methods

MILP SPA The proposed

Pairs S62051 NM_008134

NM_008134 NM_000405

NG_009059 NM_008134

NG_009059 NM_010422

NM_010422 NG_009301

NG_009301 NM_000405

39.95 39.95 39.95

21.78 21.74 21.88

3.49 3.48 3.52

10.55 10.54 10.57

6.49 6.48 6.50

11.67 11.65 11.68

vs. NM_000405, the proposed algorithm took 998.58 ms to complete the global alignment, and it is approximately 2.2 times faster than MILP, approximately 3.0 times faster than SPA, as shown in Table 3. Table 3 also shows that the new algorithm based on the pairwise CNN is the fastest one in terms of the computation time. Although the proposed one is only slightly better than the other two for the short DNA sequences, for the longer ones (such as, more than 10 000 nucleotides), the proposed one is signiﬁcantly better than the other two. When the total length of the two sequences is very large (e.g. NG_009301 vs. NM_000405), the proposed algorithm is approximately 55% faster than MILP, and approximately 68% faster than SPA. These experiments and these comparisons of the computation time prove that the proposed algorithm performs much better than both MILP and SPA. Especially, the proposed algorithm will over-perform more signiﬁcantly over the other two as aligning the long DNA sequences. Moreover, in order to visually demonstrate the advantages of the proposed algorithm in terms of the time performance, another group of experiments have been done to relate the time and the total length of two sequences. In these experiments, a group of sequence pairs are aligned using the three algorithms described above. All these experiment results are plotted in a Cartesian coordinate system, as shown in Fig. 10. The three curves illustrate the relationships between the computation time and the total length of the sequence pair for these three algorithms. In Fig. 10, the horizontal-axis is corresponding to the total lengths of these sequence pairs, and the vertical-axis is corresponding to the computation time taken by these three algorithms to align these sequence pairs. It clearly shows that these three algorithms have a quite close performance in terms of the time when the total length, m þ n is not very large. With the increase of the total length, except the CNN curve, both SPA curve and MILP curve rise up steeply. Although the curve of the proposed algorithm also rises up with the increase of the total length as well, it rises up much slower than the other two. Same as Table 3, Fig. 10 also indicates that the proposed one needs the less computation time to align these sequence pairs, and the larger the m þ n is, the more the superior it is. Just as mentioned before, besides the computation time, the global similarity is also of importance to distinct these three algorithms. The experiments on the global similarity of the following DNA sequence pairs are also conducted, where the numerical global similarity is calculated using Eq. (12), as exhibited in Table 4. Consistent with Table 3, Table 4 shows that each of the three algorithms can achieve a good similarity for the DNA sequence pair. For the short pair such as S62051 vs. NM_008134, the three algorithms obtain an identical similarity of 39:95%. However, for the longer DNA sequence pairs such as NM_010422 vs. NG_009301, the global similarity values with these three algorithm are slightly different. Usually, the proposed new CNN algorithm results in a little bit larger similarity value than the other two. The reason is that this CNN algorithm always generates

a little bit more alignment pairs of the nucleotide bases than the other two. Table 4 also shows that the proposed CNN algorithm can often obtain the higher similarity than the other two especially on these long DNA sequence pairs.

5. Conclusions The cellular neural network is one of the classic neural networks. During the past decades, the two-dimensional CNNs have attracted the most interests from the researchers but the onedimensional models of the CNN have not yet been paid enough attention too. In order to exploit a new model for the CNN and develop its applications, this paper proposes a simpliﬁed onedimensional CNN model, which consists of a linear cell chain. Then a pairwise CNN is constructed using two one-dimensional CNNs, including an immovable one and a movable one. Based on the pairwise CNN, the global alignment algorithm for two DNA sequences is developed in this paper. The simulation and comparison experiments are carried out and evaluated with the DNA sequences from the publicly available databases of the NCBI. By comparing with the other two algorithms, the developed one performs better in the global alignment with the signiﬁcantly less execution time. These experiments in this paper prove that the proposed model is practically applicable and the algorithm with a pairwise one-dimensional CNN is efﬁcient.

Acknowledgments This work is supported by the National Science Foundation of China (NSFC) under Grants 61175061 and 61273308, and the Fundamental Research Funds for the Central Universities under Grant ZYGX2013J076. Authors would also like to thank NCBI for the experiment data of DNA sequences, and thank all editors/ reviewers for their comments and helps on manuscript improvement. References [1] L.O. Chua, L. Yang, Cellular neural networks: theory, IEEE Trans. Circuits Syst. 35 (1988) 1257–1272. [2] L.O. Chua, L. Yang, Cellular neural networks: applications, IEEE Trans. Circuits Syst. 35 (1988) 1273–1290. [3] Z. Liu, H. Zhang, Z. Wang, Novel stability criterions of a new fuzzy cellular neural networks with time-varying delays, Neurocomputing 72 (2009) 1056–1064. [4] S. Long, D. Xu, Stability analysis of stochastic fuzzy cellular neural networks with time-varying delays, Neurocomputing 74 (2011) 2385–2391. [5] L. Wang, T. Chen, Complete stability of cellular neural networks with unbounded time-varying delays, Neural Netw. 36 (2012) 11–17. [6] W.H. Chen, W.X. Zheng, A new method for complete stability analysis of cellular neural networks with time delay, IEEE Trans. Neural Netw. 7 (2010) 1126–1139. [7] X. Song, X. Xin, W. Huang, Exponential stability of delayed and impulsive cellular neural networks with partially Lipschitz continuous activation functions, Neural Netw. 29–30 (2012) 80–90.

514

L. Ji et al. / Neurocomputing 149 (2015) 505–514

[8] M. Di Marco, M. Forti, M. Grazzini, L. Pancioni, Convergence of a class of cooperative standard cellular neural network arrays, IEEE Trans. Circuits Syst. I: Regul. Pap. 4 (2012) 772–783. [9] X. Huang, Z. Zhao, Z. Wang, Y. Li, Chaos and hyperchaos in fractional-order cellular neural networks, Neurocomputing 94 (2012) 13–21. [10] L. Wang, W. Liu, H. Shi, J.M. Zurada, Cellular neural networks with transient chaos, IEEE Trans. Circuits Syst. II: Express Briefs 5 (2007) 440–444. [11] C.L. Chang, K.W. Fan, I.F. Chung, C.T. Lin, A recurrent fuzzy coupled cellular neural network system with automatic structure and template learning, IEEE Trans. Circuits Syst. II: Express Briefs 8 (2004) 602–606. [12] C.H. Huang, C.T. Lin, Bio-inspired computer fovea model based on hexagonaltype cellular neural network, IEEE Trans. Circuits Syst. I: Regul. Pap. 1 (2007) 35–47. [13] L. Zhang, Z. Yi, S.L. Zhang, P.A. Heng, Activity invariant sets and exponentially stable attractors of linear threshold discrete-time recurrent neural networks, IEEE Trans. Autom. Control 6 (2009) 1341–1347. [14] G. Manganaro, J.P. de Gyvez, One-dimensional discrete-time CNN with multiplexed template-hardware, IEEE Trans. Circuits Syst. I: Fundam. Theory Appl. 5 (2000) 764–769. [15] S.N.T. Polat, O. Yavuz, V. Tavsanoglu, Efﬁcient Simulation of Time-Derivative Cellular Neural Networks, IEEE Trans. Circuits Syst. I: Regul. Pap. 11 (2012) 2638–2645. [16] J. Martłnez, J. Garrigs, J. Toledo, J. Ferr¢ndez, An efﬁcient and expandable hardware implementation of multilayer cellular neural networks, Neurocomputing 114 (2013) 54–62. [17] C.Y. Wu, S.H. Chen, The design and analysis of a CMOS low-power largeneighborhood CNN with propagating connections, IEEE Trans. Circuits Syst. I: Regul. Pap. 2 (2009) 440–452. [18] E. Cesur, N. Yildiz, V. Tavsanoglu, On an improved FPGA implementation of CNN-based Gabor-type ﬁlters, IEEE Trans. Circuits Syst. II: Express Briefs 11 (2012) 815–819. [19] D. Amanatidis, D. Tsaptsinos, P. Giaccone, G. Jones, Optimizing motion and colour segmented images with neural networks, Neurocomputing 62 (2004) 197–223. [20] Y.W. Shou, C.T. Lin, Image descreening by GA-CNN-based texture classiﬁcation, IEEE Trans. Circuits Syst. I: Regul. Pap. 11 (2004) 2287–2299. [21] C.T. Lin, C.H. Huang, S.A. Chen, CNN-based hybrid-order texture segregation as early vision processing and its implementation on CNN-UM, IEEE Trans. Circuits Syst. I: Regul. Pap. 10 (2007) 2277–2287. [22] Q. Gao, G.S. Moschytz, Fingerprint feature matching using CNNs, in: Proceedings of the 2004 International Symposium on Circuits and Systems, vol. 3, 2004, pp. 73–76. [23] M. Milanova, U. Bker, Object recognition in image sequences with cellular neural networks, Neurocomputing 31 (2000) 125–141. [24] A. Delbem, L. Correa, L. Zhao, Design of associative memories using cellular neural networks, Neurocomputing 72 (2009) 2180–2188. [25] Q. Han, X. Liao, T. Huang, et al., Analysis and design of associative memories based on stability of cellular neural networks, Neurocomputing 97 (2012) 192–200. [26] Z. Zeng, J. Wang, Analysis and design of associative memories based on recurrent neural networks with linear saturation activation functions and time-varying delays, Neural Comput. 8 (2007) 2149–2182. [27] N. Takahashi, M. Nagayoshi, S. Kawabata, T. Nishi, Stable patterns realized by a class of one-dimensional two-layer CNNs, IEEE Trans. Circuits Syst. II: Regul. Pap. 11 (2008) 3607–3620. [28] N. Takahashi, et al., Sufﬁcient conditions for one-dimensional cellular neural networks to perform connected component detection, Nonlinear Anal.: Real World Appl. 11 (2010) 4202–4213. [29] S. Needleman, C. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two sequences, J. Mol. Biol. 3 (1970) 443–453. [30] A.J. Gibbs, G.A. Mcintyre, A. George, The diagram, a method for comparing sequences, its use with amino acid and nucleotide sequences, Eur. J. Biochem. 1 (1970) 1–11. [31] T.F. Smith, M.S. Watermann, Identiﬁcation of common molecular subsequence, J. Mol. Biol. 147 (1981) 196–197. [32] D.J. Lipman, W.R. Pearson, Rapid and sensitive protein similarity searchers, Science 4693 (1985) 1435–1441. [33] S.F. Altschul, T.L. Madden, A.A. Schaffer, et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 7 (1997) 3389–3402. [34] F. Corpet, Multiple sequence alignment with hierarchical clustering, Nucleic Acids Res. 22 (1988) 10881–10890.

[35] D.F. Feng, R.F. Doolittle, Progressive sequence alignment as a prerequisite to correct phylogenetic trees, J. Mol. Evol. 25 (1987) 351–360. [36] S.A.M.A. Junid, N.M. Tahir, Z.A. Majid, M.F.M. Idros, Potential of graph theory algorithm approach for DNA sequence alignment and comparison, in: Proceedings of the Third International Conference on Intelligent Systems, Modelling and Simulation (ISMS 2012), vol. 2, 2012, pp. 187–190. [37] S.R. Mcallister, R. Rajgaria, C.A. Floundas, Global pairwise sequence alignment through mixed-integer linear programming: a template-free approach, Optim. Methods Softw. 1 (2007) 127–144. [38] S. Shen, J. Yang, A. Yao, P. Hwang, Super pairwise alignment (SPA): an efﬁcient approach to global alignment for homologous sequences, J. Comput. Biol. 3 (2002) 477–486.

Luping Ji received his B.S. degree in Mechanical & Electronic Engineering from Beijing Institute of Technology, Beijing, PR China, 1999. Then, he received his M.S. and Ph.D. degrees respectively in Computer Application & Technology, 2005 and in Computer Software & Theory, 2008 from the University of Electronic Science and Technology of China, Chengdu, PR China. Currently, he is working as an associate professor in the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, PR China. His current research interests include neural networks and pattern recognition.

Xiaorong Pu is a professor at the University of Electronic Science and Technology of China, PR China. She has been a visiting scholar at Illinois Institute of Technology (IIT), USA, 2008, and at the University of Manchester Institute of Science and Technology (UMIST), Britain 2004. Her research interests include computer vision, biometrics, neural networks and operating system. She has 8 books and more than 30 academic papers published so far.

Hong Qu received the B.S., M.S. and Ph.D. degrees in Computer Science and Engineering from the University of Electronic Science and Technology of China, Chengdu, China, in 2000, 2003 and 2006, respectively. From March 2007 to February 2008, he was a postdoctoral fellow at the Advanced Robotics and Intelligent Systems Lab, School of Engineering, University of Guelph, Guelph, ON, Canada. Currently, he is currently a professor in the School of Computer Science and Engineering, University of Electronic Science and Technology of China. His current research interests include neural networks, robot, neurodynamics, intelligent computation, and optimization.

Guisong Liu received his B.S. degree in Mechanics from Xi'an Jiao Tong University, Xi'an, China, in 1995, the M.S. degree in Automatics and the Ph.D. degree in Computer Science both from the University of Electronic Science and Technology of China (UESTC, Chengdu, China), in 2000 and 2007, respectively. Now he is an associate professor in the Computational Intelligence Laboratory, the School of Computer Science and Engineering, UESTC. His research interests include computational intelligence, pattern recognition and machine learning.

One-dimensional pairwise CNN for the global alignment of two DNA sequences

One-dimensional pairwise CNN for the global alignment of two DNA sequences

Recommend Documents