A hybrid quantum-inspired neural networks with sequence inputs

A hybrid quantum-inspired neural networks with sequence inputs

Neurocomputing 117 (2013) 81–90 Contents lists available at SciVerse ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom ...

775KB Sizes 4 Downloads 72 Views

Neurocomputing 117 (2013) 81–90

Contents lists available at SciVerse ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

A hybrid quantum-inspired neural networks with sequence inputs Panchi Li n, Hong Xiao, Fuhua Shang, Xifeng Tong, Xin Li, Maojun Cao School of Computer and Information Technology, Northeast Petroleum University, Daqing, China

a r t i c l e i n f o

abstract

Article history: Received 28 May 2012 Received in revised form 21 January 2013 Accepted 25 January 2013 Communicated by G. Thimm Available online 4 March 2013

To enhance the performance of classical neural networks, a quantum-inspired neural networks model based on the controlled-Hadamard gates is proposed. In this model, the inputs are discrete sequences described by a matrix where the number of rows is equal to the number of input nodes, and the number of columns is equal to the sequence length. This model includes three layers, in which the hidden layer consists of quantum neurons, and the output layer consists of classical neurons. The quantum neuron consists of the quantum rotation gates and the multi-qubits controlled-Hadamard gates. A learning algorithm is presented in detail according to the basic principles of quantum computation. The characteristics of input sequence can be effectively obtained from both breadth and depth. The experimental results show that, when the number of input nodes is closer to the sequence length, the proposed model is obviously superior to the BP neural networks. & 2013 Elsevier B.V. All rights reserved.

Keywords: Quantum computation Quantum rotation gates Controlled-Hadamard gates Quantum neuron Quantum neural networks

1. Introduction In many applications, the system input is a temporal processes, such as the chemical reaction process and the stock market volatility process [1,2]. Many neurophysiological experiments indicate that the information processing character of the biological nerve system mainly includes the following eight aspects: the spatial aggregation, the multi-factor aggregation, the temporal cumulative effect, the activation threshold characteristic, self-adaptability, exciting and restraining characteristics, delay characteristics, conduction and output characteristics [3]. From the definition of the M-P neuron model, classical ANN preferably simulates voluminous biological neurons’ characteristics such as the spatial weight aggregation, self-adaptability, conduction and output, but it does not fully incorporate temporal cumulative effect because the outputs of ANN depend only on the inputs at the moment regardless of the prior moment. In the process of practical information processing, the memory and output of the biological nerve not only depend on the spatial aggregation of multidimensional input information, but also depend on the temporal cumulative effect. Since Kak firstly proposed the concept of quantum-inspired neural computation [4] in 1995, quantum neural networks (QNN) have attracted a great attention by the international scholars during the past decade, and a large number of novel techniques have been studied for quantum computation and neural

n

Corresponding author. Tel.: þ86 459 6507708. E-mail address: [email protected] (P. Li).

0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.01.029

networks. For example, Ref. [5] proposed the model of quantum neural networks with multilevel hidden neurons based on the superposition of quantum states in the quantum theory. In Ref. [6], an attempt was made to reconcile the linear reversible structure of quantum evolution with nonlinear irreversible dynamics of neural networks. In 1998, a new neural networks model with quantum circuit was developed for quantum computation, and was proven to exhibit a powerful learning capability [7]. Matsui et al. developed a quantum neural networks model using the single bit rotation gate and two-bit controlled-not gate. They also investigated its performance in solving the four-bit parity check and the function approximation problems [8]. Altaisky suggested that a quantum neural networks can be built using the principles of quantum information processing [9]. In his model, the input and output qubits in the QNN were implemented by optical modes with different polarization, the weights of the QNN were implemented by optical beam splitters and phase shifters. Ref. [10] proposed a completely different kind of networks from the mainstream works. In his model neurons are states, connected by gates. In our previous work [11], we proposed a quantum BP neural networks model with learning algorithm based on the single-qubit rotation gates and two-qubit controlled-not gates. Ref. [12] proposed a wave probabilities resonance principle describing quantum entanglement, and demonstrated the possible applications of the theory. Ref. [13] presented models of quasi-non-ergodic probabilistic systems that are defined through the theory of wave probabilistic functions, and showed two illustrative examples of applications of introduced theories and models. Ref. [14] proposed a weightless model based on quantum circuit, it is not only quantum-inspired but it is

82

P. Li et al. / Neurocomputing 117 (2013) 81–90

For convenience, in this paper, we represent the qubit’s state by a point on a circle of unit radius as shown in Fig. 1(b). The corresponding relations between Fig. 1(a) and (b) can be written as

actually a quantum NN. This model is based on Grover’s search algorithm, and it can perform both quantum learning and simulate the classical models. However, like M-P neurons, it also do not fully incorporate temporal cumulative effect because a single input sample is either irrelative to time or relative to a moment instead of a period of time. In order to fully simulate biological neuronal information processing mechanisms, and to enhance the approximation and generalization ability of ANN, in this paper, a novel quantumbehaved neural networks based on the controlled-Hadamard gates, called CHQNN, is proposed. Our networks are a threelayer model with a hidden layer, which employs the gradient descent principle for learning. The input/output relationship of this model is derived, based on the physical meaning of the quantum gates. The experimental results show that, under a certain condition, the CHQNN is obviously superior to the common BP neural networks.

8 a : 0!p=23f ¼ 0 and y : p=2!0 > > > > < a : p=2!p3f ¼ p and y : 0!p=2 : a : p!3p=23f ¼ p and y : p=2!p > > > > : a : 3p=2!2p3f ¼ 0 and y : p!p=2

ð2Þ

At this time, any state of the qubit may be written as 9jS ¼ cos a90S þ sin a91S:

ð3Þ

An n qubits system has 2n computational basis states. For example, a 2 qubit system has basis 900S,901S,910S,911S. Similar to the case of a single qubit, the n qubits system may form the superpositions of 2n basis states X ax 9xS, ð4Þ 9fS ¼ x A f0,1gn

2. The qubits and quantum gates

where ax is called probability amplitude of the basis states 9xS, and f0,1gn means the set of strings of length two with each letter being either zero or one. The condition that these probabilities can sum to one is expressed by the normalization condition X 2 9ax 9 ¼ 1: ð5Þ

2.1. Qubits In quantum computing, a qubit is a two-level quantum system, described by a two-dimensional complex Hilbert space. From the superposition principles, any state of the qubit may be written as 9jS ¼ cos

y 2

90S þ eif sin

y 2

91S,

x A f0,1gn

ð1Þ 2.2. Quantum rotation gate

where 0 r y r p, 0 r f r 2p. Therefore, unlike the classical bit, which can only be set equal to 0 or 1, the qubit resides in a vector space parametrized by the continuous variables y and f. Thus, a continuum of states is allowed. The Bloch sphere representation is useful in thinking about qubits since it provides a geometric picture of the qubit and of the transformations that one can operate on the state of a qubit. Owing to the normalization condition, the qubit’s state can be represented by a point on a sphere of unit radius, called the Bloch Sphere. This sphere can be embedded in a three-dimensional space of Cartesian coordinates (x ¼ cos f sin y, y ¼ sin f sin y,z ¼ cos y). By definition, a Bloch vector is a vector whose components ðx,y,zÞ single out a point on the Bloch sphere. We can say that the angles y and f define a Bloch vector, as shown in Fig. 1(a), where the points corresponding to the following states are shown: 9AS ¼ ½1,0T , 9BS ¼ ½0,1T , pffiffiffi 9CS ¼ 9ES ¼ ½p1ffiffi2 , p1ffiffi2 T , 9DS ¼ ½p1ffiffi2 , p1ffiffi2 T , 9FS ¼ ½p1ffiffi2 ,i= 2T , pffiffiffi 9GS ¼ ½p1ffiffi2 ,i= 2T .

A quantum gate is the analogue of a logic gate in a classical computer, and quantum gates are basic units of quantum algorithms. The difference between the classical and quantum context is that a quantum gate has to be implemented reversibly and, in particular, must be a unitary operation. The definition of a single qubit rotation gate is given by   cos y sin y RðyÞ ¼ : ð6Þ sin y cos y y0 Let the quantum state 9fS ¼ ½cos sin y0 , then 9fS can be transformed by RðyÞ as follows: " # cosðy0 þ yÞ RðyÞ9fS ¼ : ð7Þ sinðy0 þ yÞ

It is obvious that RðyÞ shifts the phase of 9fS.

z A A

z Q

P | C

E G

F x

|

y

x

D

B Fig. 1. A qubit description. (a) A qubit description on Bloch sphere. (b) A qubit description on unit circle.

P. Li et al. / Neurocomputing 117 (2013) 81–90

x1, x2,y, xn. That is, the operator H is applied to last a qubit if the first n qubits are all equal to one, and otherwise, nothing is done. Suppose that the 9xi S ¼ ai 90S þ bi 91S are the control qubits, and the 9fS ¼ c90S þ d91S is the target qubit. From Eq. (10), the output of C n ðHÞ is written by the equation

2.3. Hadamard gate The Hadamard gate is defined as   1 1 1 H ¼ pffiffiffi : 2 1 1

ð8Þ

C n ðHÞ9x1 x2    xn S9fS

This gate turns the computational basis 90S,91S into the new basis 9 þ S,9S, whose states are a superposition of the states of the computational basis H90S ¼ p1ffiffi2ð90Sþ 91SÞ  9 þS, H91S

¼ 9x1 S  9x2 S      9xn S  9fS n

n

zfflfflfflffl}|fflfflfflffl{ zfflfflfflffl}|fflfflfflffl{ b1 b2    bn c911    1 0Sb1 b2    bn d911    1 1S

ð9Þ

¼ p1ffiffi2ð90S91SÞ  9S:

83

n

zfflfflfflffl}|fflfflfflffl{ pffiffiffiffiffiffiffi þ 0:5b1 b2    bn ðc þdÞ911    1 0S

ð10Þ

n

zfflfflfflffl}|fflfflfflffl{ pffiffiffiffiffiffiffi þ 0:5b1 b2    bn ðcdÞ911    1 1S:

Since H2 ¼ I, H is equal to its own inverse, H1 ¼ H. Note that H is Hermitian [15]. Indeed, it is evident from the matrix representation (7) that ðHT Þn ¼ H.

ð12Þ

We say that a state of a composite system having the property that it cannot be written as a product of states of its component systems is an entangled state. For reasons which nobody fully understands, entangled states play a crucial role in quantum computation and quantum information. It is observed from Eq. (11) that the output of C n ðHÞ is in the entangled state of n þ 1 0 qubits, and the probability of the target qubit state 9f S, in which 91S is observed, equals to

2.4. Unitary operators and tensor products A matrix U is said to be unitary if ðU n ÞT U ¼ I, where the n indicates complex conjugation, and T indicates the transpose operation, I indicates the unit matrix. Similarly an operator U is unitary if ðU n ÞT U ¼ I. It is easily checked that an operator is unitary if and only if each of its matrix representations is unitary. The tensor product is a way of putting vector spaces together to form larger vector spaces. This construction is crucial in understanding the quantum mechanics of multiparticle system. Suppose V and W are vector spaces of dimensions m and n respectively. For convenience we also suppose the V and W are Hilbert spaces. Then V  W (read ‘V tensor W’) is an mn dimensional vector space. The elements of V  W are linear combinations of ‘tensor products’ 9vS  9wS of elements 9vS of V and 9wS of W. In particular, if 9iS and 9jS are orthonormal bases for the spaces V and W then 9iS  9jS is a basis for V  W. We often use the abbreviated notations 9vS9wS, 9v,wS or even 9vwS for the tensor product 9vS  9wS. For example, if V is a two-dimensional vector space with basis vectors 90S and 91S then 90S  90S and 91S  91S is an element of V  V.

2

2

P ¼ 0:5ðb1 b2    bn Þ2 ðc2 d 2cdÞ þ d :

ð13Þ

In Fig. 2(b), the operator H is applied to last a qubit if the first n qubits are all equal to zero, and otherwise, nothing is done. The controlled operation C n ðHÞ can be defined by the equation C n ðHÞ9x1 x2    xn S9fS ¼ 9x1 x2    xn SHx1 þ  þ xn 9fS:

ð14Þ

By a similar analysis with Fig. 2(a), the probability of the target 0 qubit state 9f S, in which 91S is observed, equals to 2

2

P ¼ 0:5ða1 a2    an Þ2 ðc2 d 2cdÞ þ d :

ð15Þ

At this time, after the joint control of the n input bits, the 0 target bit 9f S can be defined as follows: p ffiffiffiffiffiffiffiffiffi ffi pffiffiffi 0 ð16Þ 9f S ¼ 1P 90S þ P91S:

2.5. Multi-qubits controlled-Hadamard gate

3. The quantum neuron model

In a true quantum system, a single qubit state is often affected by a joint control of multi-qubits. A multi-qubits controlledHadamard gate C n ðHÞ is a kind of control model. The multiqubits system is also described by the wave function 9x1 x2    xn S. In an ðn þ 1Þ-bits quantum system, when the target bit is simultaneously controlled by n input bits, the dynamic behavior of the system can be described by a multi-qubits controlled-Hadamard gate in Fig. 2. In Fig. 2(a), suppose we have n þ 1 qubits, and H denotes a Hadamard gate. Then we define the controlled operation C n ðHÞ as follows:

In this section, we first propose a quantum neuron model, as illustrated in Fig. 3. This model consists of quantum rotation gates and multi-qubits controlled-Hadamard gates. The input sequences are the qubits f9xi ðt j ÞSg defined in time-domain interval [0,T], where t j A ½0,T. The output is the spatial and temporal aggregation results 9yS in [0,T], and the control parameters are the rotation angles y i ðt j Þ, where i ¼ 1,2, . . . ,n, j ¼ 1,2, . . . ,q, n denotes the number of dimension of input space, q denotes the length of input sequence. Let 0 ¼ t1 o t 2 o    ot q ¼ T represent the sampling time points, then the 9xi ðt r ÞS can be written as follows:

C n ðHÞ9x1 x2    xn S9fðtÞS ¼ 9x1 x2    xn SHx1 x2 xn 9fðtÞS,

9xi ðt r ÞS ¼ cos yi ðt r Þ90S þ sin yi ðt r Þ91S,

ð11Þ

where r ¼ 1,2, . . . ,q.

where x1 x2    xn in the exponent of H means the product of the bits

| x1

| x1

| x2

| x2

| xn

| xn

|

H

| '

|

H

Fig. 2. Multi-qubit controlled-Hadamard gate. (a) Type I. (b) Type II.

| '

ð17Þ

84

P. Li et al. / Neurocomputing 117 (2013) 81–90

| x1 (t r ) | x 2 (t r )

| x n (t r )

(t r )

| x1 (t r )

1

(t r )

2

(t r )

| x 2 (t r )

2

(t r )

n

(t r )

| x n (t r )

n

(t r )

1

| (t r )

H

| ' (t r 1 ) | y

y

| (t r )

| ' (t r 1 ) | y

H

y

Fig. 3. Quantum neuron model. (a) Type I and (b) Type II.

Suppose 9fðt 1 ÞS ¼ 90S. According to the definition of quantum rotation gates and multi-qubits controlled-Hadamard gates, the 0 9f ðt 1 ÞS is given by 0

9f ðt 1 ÞS ¼ cos jðt 1 Þ90S þsin jðt 1 Þ91S,

| x n (t r )

ð18Þ

|

pffiffiffiffiffiffiffi Q for Fig. 3(a), jðt 1 Þ ¼ arcsinð 0:5 ni¼ 1 sinðyi ðt 1 Þ þ y i ðt 1 ÞÞÞ, for pffiffiffiffiffiffiffi Qn Fig. 3(b), jðt 1 Þ ¼ arcsinð 0:5 i ¼ 1 cosðyi ðt 1 Þ þ y i ðt 1 ÞÞÞ. 0 0 Let t ¼ t r ,r ¼ 2, . . . ,q, from 9fðt r ÞS ¼ 9f ðt r1 ÞS, the 9f ðt r ÞS can be derived by 0

9f ðt r ÞS ¼ cos jðt r Þ90S þsin jðt r Þ91S,

1,1(t r )

| x1 (t r ) | x2 (t r )

n,1(t r )

1 (t r )

|

2 (t r )

|

p (t r )

H

|

1,2 (t r )

1, p (t r )

2,2 (t r )

2, p (t r )

n, 2 (t r )

n, p (t r )

1 (t r 1 )

H

|

2 (t r 1 )

H

|

p (t r 1 )

| h1

h1 w j,k

y1

| h2

h2

y2

| hp

hp

ym

Fig. 4. Quantum neural networks model with sequence inputs.

ð19Þ can be written as

where qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jðt r Þ ¼ arcsinð C r1 Sr þsin2 jðt r1 ÞÞ, C r1 ¼ cos 2jðt r1 Þsin 2jðt r1 Þ,

ð20Þ ð21Þ

Q for Fig. 3(a), Sr ¼ 0:5 ni¼ 1 sin2 ðyi ðt r Þ þ y i ðt r ÞÞ, for Fig. 3(b), Qn Sr ¼ 0:5 i ¼ 1 cos2 ðyi ðt r Þ þ y i ðt r ÞÞ. The aggregate results of quantum neuron in [0,T] is finally derived by 0

9yS ¼ 9f ðt q ÞS ¼ cos jðt q Þ90S þ sin jðt q Þ91S:

ð22Þ

In this paper, we define the output of the quantum neuron as the probability amplitude of the corresponding state, in which 91S is observed. Thus, after measuring 9yS, the actual output of the quantum neuron is rewritten as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð23Þ y ¼ C q1 Sq þsin2 jðt q1 Þ: It is worth noting that we cannot access probability amplitudes of a quantum state in real quantum systems. We can at most infer it by repeating the experiments many times. However, our models are actually ‘‘quantum-inspired NN’’ instead of real ‘‘quantum NN’’. In our models, we only used calculation formula of probability amplitude without taking into account the methods of acquiring probability amplitude in real quantum systems, in other words, our models can be run on a traditional computer.

2

fxi1 ðt r Þg

3

2

xi1 ðt 1 Þ,

6 i 7 6 6 fx2 ðt r Þg 7 6 xi2 ðt 1 Þ, 6 7¼6 6 7 6 ^ ^ 4 5 4 i xin ðt 1 Þ, fxn ðt r Þg

xi1 ðt 2 Þ,

...,

xi2 ðt 2 Þ,

...,

^

...

xin ðt 2 Þ,

...,

xi1 ðt q Þ

3

7 xi2 ðt q Þ 7 7: ^ 7 5 xin ðt q Þ

ð24Þ

Suppose that 9xi ðt r ÞS ¼ cos yi ðt r Þ90S þsin yi ðt r Þ91S, the [0,T] denotes the time-domain interval, the 0 ¼ t 1 o t 2 o    ot q ¼ T denote the sampling time points, and 9jj ðt 1 ÞS ¼ 90S, where j ¼ 1,2, . . . ,p. Let

h jr ¼

8 n pffiffiffiffiffiffiffi Y > > > 0:5 sinðyi ðt r Þ þ yij ðt r ÞÞ, > > < i¼1

j ¼ 1,3,5, . . . ,

n pffiffiffiffiffiffiffi Y > > > 0:5 cosðyi ðt r Þ þ yij ðt r ÞÞ, > > : i¼1

j ¼ 2,4,6, . . . :

ð25Þ

According to the input/output relationship of quantum neuron, in interval ½0,tr , the aggregate results of the jth quantum neuron in hidden layer can be written as 8 < hj ðt 1 Þ ¼ h j1 , qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : hj ðt r Þ ¼ h 2 hj,r1 þ h2 ðt r1 Þ, j jr

ð26Þ

where qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 hj,r1 ¼ ð12hj ðt r1 Þ2hj ðt r1 Þ 1hj ðt r1 ÞÞ:

ð27Þ

4. The CHQNN model In this paper, the CHQNN model is illustrated in Fig. 4, where the hidden layer consists of quantum neurons, and the output layer consists of common neurons. In Fig. 4, the f9x1 ðt r ÞSg,f9x2 ðt r ÞSg, . . . ,f9xn ðt r ÞSg denote the input sequences, the h1 ,h2 , . . . ,hp denote the hidden output, the wjk denote the connection weights in output layer, and the y1 ,y2 , . . . ,ym denote the networks output. The sigmoid function is used as the activation function in output layer. In CHQNN, unlike ANN’s each sample is described as a vector, each sample is described as a matrix. For example, the ith sample

The jth output in hidden layer (namely, the aggregate results in [0,T]) is given by hj ¼ hj ðt q Þ:

ð28Þ

The kth output in output layer can be written as yk ¼



1þ e

1 Pp j ¼ 1

wjk hj

,

where i ¼ 1,2, . . . ,n, j ¼ 1,2, . . . ,p, k ¼ 1,2, . . . ,m.

ð29Þ

P. Li et al. / Neurocomputing 117 (2013) 81–90

8 Sjr cotðyi ðt r Þ þ yi,j ðt r ÞÞ > > > , > < hj ðt r Þ

5. The CHQNN algorithm 5.1. The quantum description of samples Suppose that the kth sequence samples in n-dimensional input k space fX ðt r Þg ¼ ½fx k1 ðt r Þg,fx k2 ðt r Þg, . . . ,fx kn ðt r ÞgT , where r ¼ 1,2, . . . ,q, k ¼ 1,2, . . . ,M,q denotes the sequence length, and M denotes the total number of samples. Let ( Maxi,r ¼ maxðx 1i ðt r Þ,x 2i ðt r Þ, . . . ,x M i ðt r ÞÞ, ð30Þ 1 2 M Mini,r ¼ minðx i ðt r Þ,x i ðt r Þ, . . . ,x i ðt r ÞÞ,

ykir

¼

8 k > x i ðt r ÞMini,r p > > > > < Maxi,r Mini,r 2

p

if Maxi,r 4 Mini,r , if Maxi,r ¼ Mini,r a 0,

> > > 2 > > :0

85

ð31Þ

@hj ðt r Þ ¼ Sjr tanðyi ðt r Þ þ yi,j ðt r ÞÞ @yij ðt r Þ > > > , > : hj ðt r Þ

j ¼ 1,3,5, . . . , ð39Þ j ¼ 2,4,6, . . . :

From the above equations, we obtain q Y @hj ðt s Þ @hj ðt r Þ @ek @ek ¼ , @yij ðt r Þ @hj ðt q Þ s ¼ r þ 1 @hj ðt s1 Þ @yij ðt r Þ

ð40Þ

where r ¼ 1,2, . . . ,q. The gradient of the connection weights in output layer can be calculated as follows: @ek ¼ yk ð1yk Þhj ðt q Þ: @wjk

ð41Þ

where i ¼ 1,2, . . . ,n, r ¼ 1,2, . . . ,q, k ¼ 1,2, . . . ,M. These samples can be described as the following quantum form:

Because the gradient calculation is more complicated, the gradient descent method is not easy to converge. Hence we employ the Levenberg–Marquardt algorithm [16] to adjust the CHQNN parameters. Let x denote the parameter vector, v denote the error vector, and J denote the Jacobian matrix. x, v and J are defined as follows:

f9X k ðt r ÞSg ¼ ½f9xk1 ðt r ÞSg,f9xk2 ðt r ÞSg, . . . ,f9xkn ðt r ÞSgT ,

xT ¼ ½y1,1 ðt 1 Þ, . . . , yn,p ðt q Þ,w1,1 , . . . ,wpm ,

ð42Þ

vT ¼ ½e1 ,e2 , . . . ,em ,

ð43Þ

if Maxi,r ¼ Mini,r ¼ 0,

9xki ðt r ÞS

ð32Þ

k k ¼ cosðyir Þ90S þsinðyir Þ91S.

where Similarly, suppose the kth sample in m-dimensional output k space fY g ¼ ½fy k1 g,fy k2 g, . . . ,fy km gT , where k ¼ 1,2, . . . ,M. M denotes the total number of samples. Let ( Maxi ¼ maxðy 1i ,y 2i , . . . ,y M i Þ, ð33Þ 1 2 M Mini ¼ minðy i ,y i , . . . ,y i Þ, then, these output samples can be normalized according to the following equation: 8 > y k Mini > > > i if Maxi 4Mini , < Maxi Mini ð34Þ y ki ¼ if Maxi ¼ Mini a 0, >1 > > > :0 if Max ¼ Min ¼ 0,

2

@e1 @y1,1 ðt1 Þ

6 @e 2 6 6 @y ðt Þ JðxÞ ¼ 6 1,1 1 6 ^ 4 @em @y1,1 ðt1 Þ



@e1 @yn,p ðtq Þ

@e1 @w1,1



@e1 @wp,m



@e2 @yn,p ðtq Þ

@e2 @w1,1



@e2 @wp,m



^

^





@em @yn,p ðtq Þ

@em @w1,1



3

7 7 7 7: ^ 7 5

ð44Þ

@em @wp,m

According to Levenberg–Marquardt algorithm, the CHQNN iterative equation is written as follows: xt þ 1 ¼ xt 

JT ðxt Þvðxt Þ T

J ðxt ÞJðxt Þ þ mt I

,

ð45Þ

where i ¼ 1,2, . . . ,m.

where x denotes the adjust parameter vector, t denotes the iterative steps, I denotes the unit matrix, and mt is a small positive number to ensure the matrix JT ðxt ÞJðxt Þ þ mt I invertible.

5.2. The parameters adjustment of CHQNN

6. Simulations

In CHQNN, the adjustable parameters include the rotation angles of quantum rotation gates in hidden layer, and the weights in output layer. Suppose the normalized desired outputs are y 1 , y 2 , . . . ,y m , where m denotes the number of output nodes. The evaluation function is defined as

6.1. Data preparation

i



i

m m 1X 1X ðek Þ2 ¼ ðy y Þ2 : 2k¼1 2k¼1 k k

ð35Þ

Let 2

2

Sjr ¼ h jr ð12hj ðt r1 Þ2hj ðt r1 Þ

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1hj ðt r1 ÞÞ:

ð36Þ

According to the gradient descent algorithm in Ref. [16], the gradient of the rotation angles of the quantum rotation gates can be calculated as follows m X @ek ¼ yk ð1yk Þwjk , @hj ðt q Þ k¼1 2

ð37Þ 2

2

ð12h jr Þhj,r1 þ h jr ð2hj ðt r1 Þ1Þ @hj ðt r Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ , 2 @hj ðt r1 Þ hj ðt r Þ 1hj ðt r1 Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 where hj,r1 ¼ hj ðt r1 Þ 1hj ðt r1 Þ.

ð38Þ

To evaluate the performance of CHQNN approach, a series of experiments on KDD CUP 1999 dataset were conducted. In these experiments, we implemented and evaluated the proposed method in Matlab (Version 7.1.0.246) on a Windows PC with 2.19 GHz CPU and 1.00 GB RAM, and we also compared it with the BPNN (back propagation neural networks) with a hidden layer [16] in this section. Our CHQNN has the same structure and parameters as the BPNN in the simulations, and the same Levenberg–Marquardt algorithm [16] is employed in two models. The KDD Cup 1999 dataset comprises a fixed set of connection-based features which are related to normal and malicious traffic. Each connection includes 41 feature values which form a vector. Note that some features are continuous and some are nominal. Since the proposed algorithm requires continuous values, these nominal values will be first converted to continuous values. All data consist of a kind of normal connection and 22 kinds of intrusive ones. Random selection has been used in many applications to reduce the size of the dataset. In this study, we randomly select 790 records from the normal connections and 10 main categories of attacks to train CHQNN and another 340 connections for testing that consists of 50 normal connections

86

P. Li et al. / Neurocomputing 117 (2013) 81–90

and 290 intrusive ones. Table 1 shows detailed information about the number of all records. It is important to note that the test data include specific attack types not present in the training data. This makes the intrusion detection task more realistic.

Convergence ratio: Suppose N denotes the total number of training trials, and C denotes the number of convergent training trials. The convergence ratio defined as

l¼ 6.2. Evaluation criteria To facilitate the description, we assume that the output layer is only one output node. Some relevant evaluation criteria are defined as follows. Approximation error: Suppose that y k and yk denote the desired output and actual output of the kth sample after training, respectively. The approximation error is defined as E ¼ max 9y k yk 9,

ð46Þ

1rkrM

where M denotes the number of training samples. Average approximation error: Suppose E1 ,E2 , . . . ,EN denote the approximation error over N simulations, respectively. The average approximation error is defined as Eavg ¼

N 1X E: Ni¼1 i

ð47Þ

Error mean: Suppose y k and yik denote the desired output and actual output after the ith training, respectively. The error mean is defined as Emean ¼

N X M 1 X ðy yi Þ: MN i ¼ 1 k ¼ 1 k k

ð48Þ

where M denotes the number of training samples, and N denotes the total number of training trials. Error variance: Suppose y k and yk denote the desired output and actual output after training, respectively. Let ek ¼ y k yk . The error variance is defined as 0 12 N X M M X X 1 1 @ek  Evar ¼ eA , ð49Þ ðM1ÞN i ¼ 1 k ¼ 1 Mj¼1 j where M denotes the number of training samples, and N denotes the total number of training trials. Iterative steps: In a training trial, the times of adjusting all network parameters are defined as iterative steps. Average iterative steps: Suppose S1 ,S2 , . . . ,SN denote the iterative steps over N training trials, respectively. The average iterative steps are defined as Savg ¼

N 1X S: Ni¼1 i

ð50Þ

Convergence: Suppose E denotes the approximation error after training, and e denotes the target error. If E o e, the networks training is considered to have converged. Table 1 Number and distribution of training and test dataset.

C : N

ð51Þ

6.3. Comparison of training results According to the nature of the problem, there is only one output node in CHQNN and BPNN. In order to fully compare the approximation ability of two models, the number of hidden nodes are respectively set to 40,41, . . . ,50. The normalized maximum absolute error is set to 0.01, and the maximum of iterative steps is set to 200. The CHQNN rotation angles in hidden layer are initialized to random numbers in ðp=2, p=2Þ, and the connection weights in output layer are initialized to random numbers in ð1,1Þ. For BPNN, all weights are initialized to random numbers in ð1,1Þ, and the Sigmoid functions are used to activate functions in hidden layer and output layer. In order to manifest the superiority of the CHQNN, for each sample, we added the 42th feature that is same as the 41th feature. Obviously, BPNN has 42 input nodes. For the number of input nodes of CHQNN, we apply the following eight kinds of settings shown in Table 2. It is worth noting that, in CHQNN, an n  q matrix can be used to describe a single sequence sample. In general, BPNN cannot deal directly with a single n  q sequence sample. In BPNN, an n  q matrix are usually regarded as q n-dimensional vector samples. For fair comparison, in BPNN, we have expressed the n  q sequence samples into the nq-dimensional vector samples. Therefore, in Table 1, the sequence length does not change for BPNN. It is clear that, in Table 1, there is only one kind of BPNN42. Our experiment scheme is as follows. For each kind of combination of input nodes and hidden nodes, 8 kinds of CHQNNs and BPNN are run 10 times by randomly selecting data according to Table 1. Then we use six indicators, such as Eavg, Emean, Evar, Savg, l and average running time, to compare CHQNN and BPNN. Training results contrast is illustrated in Figs. 5–10, where CHQNNn_q denotes CHQNN with n input nodes and q sequence length. Taking 45 hidden nodes for example, the training results contrast is shown in Table 3. As is illustrated by Figs. 5–10, we can see that when the input nodes take 1, 2, 21, and 42, the performance of CHQNNs is inferior to that of BPNN, when the input nodes take 3 and 14, the performance of CHQNNs is roughly the same as that of BPNN, however, when the input nodes take 6 and 7, the performance of CHQNN is obviously superior to that of BPNN. 6.4. Comparison of testing results Next, we investigate the generalization ability of CHQNN by applying samples in the testing dataset. Based on the above Table 2 The input nodes and the sequence length setting of CHQNN and BPNN.

Connection type

Training dataset (%)

Testing dataset (%)

Desired output

Normal smurf neptune satan ipsweep portsweep nmap back warezclient teardrop pod

100 100 100 80 80 80 50 50 50 50 50

50 50 50 30 30 30 20 20 20 20 20

1 2 3 4 5 6 7 8 9 10 11

CHQNN 12.66 12.66 12.66 10.13 10.13 10.13 6.33 6.33 6.33 6.33 6.33

14.71 14.71 14.71 8.82 8.82 8.82 5.88 5.88 5.88 5.88 5.88

BPNN

Input nodes

Sequence length

Input nodes

Sequence length

1 2 3 6 7 14 21 42

42 21 14 7 6 3 2 1

42 42 42 42 42 42 42 42

1 1 1 1 1 1 1 1

P. Li et al. / Neurocomputing 117 (2013) 81–90

450

20 CHQNN6_7 CHQNN7_6 CHQNN3_14 CHQNN14_3 CHQNN2_21 CHQNN21_2 CHQNN1_42 CHQNN42_1 BPNN42

16

12

CHQNN6_7 CHQNN7_6 CHQNN3_14 CHQNN14_3 CHQNN2_21 CHQNN21_2 CHQNN1_42 CHQNN42_1 BPNN42

400 average iterative steps

average approximation error

87

8

4

350 300 250 200 150 100

0 40

42

44

46

48

40

50

42

7

error mean

4 3 2

6000 5500

1 0 −1

5000 4500 4000 3500

50

2500

1500

42

44 46 number of hidden nodes

48

1000 40

50

42

44

46

number of hidden nodes Fig. 9. The average running time contrast.

30

300

CHQNN6_7 CHQNN7_6 CHQNN3_14 CHQNN14_3 CHQNN2_21 CHQNN21_2 CHQNN1_42 CHQNN42_1 BPNN42

20

15

CHQNN6_7 CHQNN7_6 CHQNN3_14 CHQNN14_3 CHQNN2_21 CHQNN21_2 CHQNN1_42 CHQNN42_1 BPNN42

250 convergence ratio (%)

25

error variance

48

CHQNN6_7 CHQNN7_6 CHQNN3_14 CHQNN14_3 CHQNN2_21 CHQNN21_2 CHQNN1_42 CHQNN42_1 BPNN42

3000

Fig. 6. The error mean contrast.

10

5

0 40

50

2000

−2 −3 40

48

6500

CHQNN6_7 CHQNN7_6 CHQNN3_14 CHQNN14_3 CHQNN2_21 CHQNN21_2 CHQNN1_42 CHQNN42_1 BPNN42

average running time (s)

5

46

Fig. 8. The average iterative steps contrast.

Fig. 5. The average approximation error contrast.

6

44

number of hidden nodes

number of hidden nodes

200

150

100

50

42

44

46

48

50

number of hidden nodes

0 40

42

44

46

48

50

number of hidden nodes

Fig. 7. The error variance contrast.

Fig. 10. The convergence ratio contrast.

experimental results, we only investigate CHQNN_7 and CHQNN7_6. Our experiment scheme is that, two QINNs and BPNN perform 10 training on the training dataset, and are immediately performed testing on the testing dataset after each training. The average prediction results of 10 testing are shown in Tables 4–6, and the comparison of predicting precision is illustrated in Fig. 11. The experimental results show that the generalization ability of CHQNN6_7 and CHQNN7_6 is obviously superior to that of the corresponding BPNN.

These experimental results can be simply explained as follows. For processing of input information, CHQNN and BPNN take two different approaches. CHQNN directly receives a discrete input sequence, using quantum information processing mechanism, the input is circularly mapped to the output of quantum controlledHadamard gates in hidden layer. As the controlled-Hadamard gate’s output is in the entangled state of multi-qubits, therefore, this mapping is highly nonlinear, which makes CHQNN have stronger approximation ability. From the CHQNN algorithm, we

88

P. Li et al. / Neurocomputing 117 (2013) 81–90

Table 3 Training results contrast under 45 hidden nodes. Input sequence

1_42 2_21 3_14 6_7 7_6 14_3 21_2 42_1

Eavg

Emean

Evar

Savg

l (%)

CHQNN

BPNN

CHQNN

BPNN

CHQNN

BPNN

CHQNN

BPNN

CHQNN

BPNN

5.8734 5.7336 2.3582 0.1296 0.1303 0.2679 5.2201 5.6785

4.5825 4.5825 4.5825 4.5825 4.5825 4.5825 4.5825 4.5825

3.8  10  12  0.0043  0.1746  0.0014 0.0018  0.0081  0.1941  0.8221

 2.8494  2.8494  2.8494  2.8494  2.8494  2.8494  2.8494  2.8494

9.5657 7.1563 1.9133 0.0003 0.0005 0.0016 7.3620 9.4206

3.8265 3.8265 3.8265 3.8265 3.8265 3.8265 3.8265 3.8265

200 200 181.3 88.4 127.8 178.3 200 200

191 191 191 191 191 191 191 191

0 0 60 90 80 40 0 0

40 40 40 40 40 40 40 40

Table 4 The number of correct prediction results of CHQNN6_7. Connection type

Hidden nodes 40

Normal smurf neptune satan ipsweep portsweep nmap back warezclient teardrop pod Total

41

Average 42

43

44

45

46

47

48

49

50

50 35 49 18 20 17 15 9 11 18 18

50 45 50 23 26 25 19 14 12 19 19

49 32 35 19 21 18 13 12 6 16 17

50 42 50 23 21 23 19 15 8 20 19

45 33 35 19 19 20 14 12 9 14 15

45 44 50 24 24 25 19 14 9 20 18

50 39 45 26 25 23 17 15 8 18 17

45 40 45 23 20 23 17 11 7 17 17

45 39 40 21 22 22 15 13 5 16 17

50 42 44 26 24 26 17 15 9 18 17

50 50 50 28 24 27 19 17 10 20 19

44 36 40 20 20 20 15 12 8 16 16

259

300

237

289

233

293

283

264

254

287

313

245

Table 5 The number of correct prediction results of CHQNN7_6. Connection type

Hidden nodes 40

Normal smurf neptune satan ipsweep portsweep nmap back warezclient teardrop pod Total

41

Average 42

43

44

45

46

47

48

49

50

40 29 35 19 18 21 13 12 5 14 17

45 40 45 23 25 25 17 15 5 18 19

50 40 43 26 25 25 16 13 7 18 17

50 37 45 25 26 26 17 14 5 18 17

40 29 35 20 19 20 13 12 5 14 17

45 41 45 25 25 26 16 15 5 18 19

50 38 50 28 23 29 16 17 7 20 19

50 43 50 28 27 29 19 17 7 20 19

50 38 40 23 22 24 15 14 6 16 15

45 36 40 23 22 23 15 11 5 16 17

35 28 30 15 17 16 11 10 4 12 17

42 34 39 22 21 23 14 13 5 16 16

223

276

280

279

224

281

297

309

261

253

195

244

can see that, the sequence length denotes the depth of pattern memory, and the number of input nodes denotes the breadth of pattern memory. When the depth and the breadth are appropriately matched, the CHQNN shows excellent performance. For the BPNN, it does not directly deal with discrete input sequence, it only can obtain the sample characteristics by way of breadth instead of depth. Hence, in the BPNN information processing, there exists inevitably the loss of sample characteristics, which affects its approximation and generalization ability.

6.5. The theoretical explanation of the experimental results In this section, we theoretically explain the above experimental results. Assume that n denotes the number of input nodes, q denotes the sequence length, p denotes the number of hidden

nodes, and m denotes the number of output nodes, and the product of nq is approximately a constant. It is clear that the number of adjustable parameters in CHQNN and BPNN is the same, i.e., equal to npq þpm. The weights adjustment formulas in the output layer of CHQNN and BPNN are also the same. But their parameters adjustment of hidden layer is completely different. The adjustment of hidden parameters in CHQNN is much more complex than that in BPNN. In BPNN, each hidden parameter adjustment only involves two derivative calculations. In CHQNN, each hidden layer parameter adjustment involves at least two and at most qþ 1 derivative calculations. In CHQNN, when q¼1, although the number of input nodes is the greatest possible, the calculation of the hidden layer output and hidden parameter adjustment is also most simple, which directly leads to the reduction of the approximation ability. When

P. Li et al. / Neurocomputing 117 (2013) 81–90

89

Table 6 The number of correct prediction results of BPNN. Connection type

Hidden nodes 40

Normal smurf neptune satan ipsweep portsweep nmap back warezclient teardrop pod Total

41

Average 42

43

44

46

47

48

49

50

50 25 25 14 14 15 9 9 5 10 10

35 20 15 12 11 12 5 7 5 8 14

50 39 40 26 25 27 15 15 10 18 18

35 20 20 11 11 12 7 7 4 8 14

30 30 30 17 17 18 11 12 8 12 20

35 30 35 17 16 18 13 11 9 12 18

50 40 39 21 22 24 15 14 7 14 16

35 5 5 6 6 6 2 3 3 4 10

35 24 20 14 14 15 8 8 5 8 16

35 25 25 14 13 15 9 9 6 10 16

40 30 30 17 16 18 11 11 6 12 16

35 23 23 14 14 15 9 9 6 9 14

186

144

283

150

204

213

261

84

167

178

207

170

computational efficiency by using the superposition of quantum states. In CHQNN, the input samples have been converted into corresponding quantum superposition states after preprocessing. Hence, for a lot of quantum rotation gates and controlled-not gates concerned in CHQNN, information processing can be performed simultaneously, which greatly improves the computational efficiency. Because the above two experiments are performed in a classical computer, the quantum parallelism has not been explored. However, the efficient computational ability of CHQNN is bound to stand out in future quantum computer.

100

80 precision (%)

45

60

40 CHQNN6_7 CHQNN7_6 BPNN42 20 40

42

7. Conclusions and future directions 44

46

48

50

number of hidden nodes Fig. 11. The comparison of predicting precision under the different hidden nodes number.

n ¼1, the calculation of the hidden layer output is the most complex, which makes the CHQNN have strong nonlinear mapping ability. However, at this time, the calculation of hidden parameter adjustment is also most complex. A large number of derivative calculation can lead the adjustment of parameters tend to zero or infinite, which can hinder the convergence of the training process and lead to the reduction of the approximation ability. Hence, when q¼1 or n ¼1, the approximation ability of CHQNN is inferior to that of BPNN. When n 4 1 or q4 1, the approximation ability of CHQNN tends to improve, and under a certain condition, the approximation ability of CHQNN will certainly be superior to that of BPNN. The above analysis is consistent with the experimental results. In addition, what is the accurate relationship between n and q to make CHQNN approximation ability the strongest? This problem needs further study, and usually depends on specific issues. Our conclusions based on experiments are as follows: when n  q, CHQNNn_q is superior to the BPNN with nq input nodes. It is worth pointing out that CHQNN is potentially much more computationally efficient than all the models referenced above in the Introduction. The efficiency of many quantum algorithms directly come from quantum parallelism that is a fundamental feature of many quantum algorithms. Heuristically, and at the risk of over-simplifying, quantum parallelism allows quantum computers to evaluate a function f(x) for many different values of x simultaneously. Although quantum simulation requires much resources in general, quantum parallelism leads to very high

This paper proposes quantum neural networks model with sequence inputs based on the principle of quantum computing. The experimental results revealed that, with application of the information processing mechanism of quantum controlledHadamard gates, CHQNN can effectively obtain the sample characteristics by way of breadth and depth, and obviously enhance approximation and generalization ability when the number of input nodes is closer to the sequence length. The continuity and computational complexity of the CHQNN are subject to further research.

Acknowledgment We thank the two anonymous reviewers sincerely for their many constructive comments and suggestions, which have tremendously improved the presentation and quality of this paper. This work was supported by the National Natural Science Foundation of China (Grant no. 61170132). References [1] H. Matsushita, T. Mihira, T. Takizawa, Chemical reaction process and the single crystal growth of CuInS2 compound, J. Cryst. Growth 197 (1999) 169–176. [2] C. Morana, Frequency domain principal components estimation of fractionally cointegrated processes: some new results and an application to stock market volatility, Physica A 355 (2005) 165–175. [3] A.C. Tsoi, A.D. Back, Locally recurrent globally feedforward networks: a critical review of architectures, IEEE Trans. Neural Networks 5 (2) (1994) 229–239. [4] S. Kak, On quantum neural computing, Inf. Sci. 83 (1995) 143–160. [5] G. Purushothaman, N.B. Karayiannis, Quantum neural networks QNN’s: inherently fuzzy feedforward neural networks, IEEE Trans. Neural Networks 8 (3) (1997) 679–693.

90

P. Li et al. / Neurocomputing 117 (2013) 81–90

[6] M. Zak, C.P. Williams, Quantum neural nets, Int. J. Theor. Phys. 37 (2) (1998) 651–684. [7] N. Matsui, M. Takai, H. Nishimura, A network model based on qubit-like neuron corresponding to quantum circuit, Electron. Commun. Jpn. 83 (10) (2000) 67–73. [8] N. Matsui, N. Kouda, H. Nishimura, Neural network based on QBP and its performance, in: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 3, 2000, pp. 247–252. [9] M.V. Altaisky, Quantum neural network, Preprint arxiv.org:quant-ph/ 0107012, 2001, pp. 1–4. [10] F. Shafee, Neural networks with quantum gated nodes, Eng. Appl. Artif. Intell. 20 (2007) 429–437. [11] P.C. Li, S.Y. Li, Learning algorithm and application of quantum BP neural networks based on universal quantum gates, J. Syst. Eng. Electron. 19 (1) (2008) 167–174. [12] M. Svitek, Wave probabilities and quantum entanglement, Neural Network World 5 (2008) 401–406. [13] M. Svitek, Quasi-non-ergodic probabilistic systems and wave probabilistic functions, Neural Network World 3 (2009) 307–320. [14] A.J. da Silva, W.R. de Oliveira, T.B. Ludermir, Classical and superposed learning for quantum weightless neural networks, Neurocomputing 75 (2012) 52–60. [15] G. Benebti, G. Casati, G. Strini, Principles of Quantum Computation and Information Volume I: Basic Concepts, World Scientific, 2004, pp. 108–109. [16] M.T. Hagan, H.B. Demuth, M.H. Beale, Neural Networks Design, PWS Publishing Company, USA, 1996, pp. 391–399.

Fuhua Shang received the M.S. and Ph.D. degrees from Harbin Institute of Technology, China, in 1990 and 2007, respectively. Currently, he is a professor in the School of Computer and Information Technology, Northeast Petroleum University, China. His current research interests include artificial intelligence and machine learning.

Panchi Li received the B.S. and M.S. degrees from Northeast Petroleum University, China, in 1998 and 2004, respectively, and the Ph.D. degree from Harbin Institute of Technology, China, in 2009. Currently, he is a professor in the School of Computer and Information Technology, Northeast Petroleum University, China. His current research interests include quantum neural networks and quantum optimization algorithms.

Xin Li received the B.S. and M.S. degrees from Northeast Petroleum University, China, in 2000 and 2003, respectively, and the Ph.D. degree from Dalian University of Technology, China, in 2010. Currently, he is an associate professor in the School of Computer and Information Technology, Northeast Petroleum University, China. His current research interests include quantum searching algorithms and quantum neural networks.

Hong Xiao received the B.S. and M.S. degrees from Northeast Petroleum University, China, in 2001 and 2004, respectively, and then she became a teacher there. Currently, she is a Ph.D. candidate in Northeast Petroleum University, China. Her current research interests include neural networks and evolutionary algorithms.

Maojun Cao received the B.S. and M.S. degrees from Northeast Petroleum University, China, in 2001 and 2008, respectively. Currently, he is an associate professor in the School of Computer and Information Technology, Northeast Petroleum University, China. His current research interests include artificial intelligence and machine learning.

Xifeng Tong received the M.S. and Ph.D. degrees from Harbin Institute of Technology, China, in 2003 and 2008, respectively. Currently, he is an associate professor in the School of Computer and Information Technology, Northeast Petroleum University, China. His current research interests include pattern recognition and image processing.