A novel CNN based security guaranteed image watermarking generation scenario for smart city applications

A novel CNN based security guaranteed image watermarking generation scenario for smart city applications

Accepted Manuscript A Novel CNN based Security Guaranteed Image Watermarking Generation Scenario for Smart City Applications Daming Li , Lianbing Den...

908KB Sizes 0 Downloads 15 Views

Accepted Manuscript

A Novel CNN based Security Guaranteed Image Watermarking Generation Scenario for Smart City Applications Daming Li , Lianbing Deng , Brij Bhooshan Gupta , Haoxiang Wang , Chang Choi PII: DOI: Reference:

S0020-0255(18)30145-2 10.1016/j.ins.2018.02.060 INS 13465

To appear in:

Information Sciences

Received date: Revised date: Accepted date:

17 August 2017 19 February 2018 26 February 2018

Please cite this article as: Daming Li , Lianbing Deng , Brij Bhooshan Gupta , Haoxiang Wang , Chang Choi , A Novel CNN based Security Guaranteed Image Watermarking Generation Scenario for Smart City Applications, Information Sciences (2018), doi: 10.1016/j.ins.2018.02.060

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

A Novel CNN based Security Guaranteed Image Watermarking Generation Scenario for Smart City Applications Daming Li1,2,3, Lianbing Deng4,5, Brij Bhooshan Gupta6, Haoxiang Wang7, Chang Choi8,9*

2. City University of Macau

CR IP T

1. The Post-Doctoral Research Center of Zhuhai Da Hengqin Science and Technology Development Co.,Ltd.

3. International Postdoctoral Science and Technology Research Institute Co.,Ltd. 4. Huazhong University of Science and Technology

5. Zhuhai Da Hengqin Science and Technology Development Co.,Ltd.

AN US

6. National Institute of Technology Kurukshetra Haryana INDIA 7. Cornell University, USA

8. Dept. of Computer Engineering, Chosun University, Rep. of Korea 9. IT Research Institute, Chosun University, Rep. of Korea

ED

M

* Corresponding Author

Abstract

The rise of machine learning increases the current computing capabilities and paves the way to

PT

novel disruptive applications. In the current era of big data, the application of image retrieval technology for large-scale data is a popular research area. To ensure the robustness and security of digital image watermarking, we propose a novel algorithm using synergetic neural networks. The

CE

algorithm first processes a meaningful gray watermark image, then embeds it as a watermark signal into the block Discrete Cosine Transform (DCT) component. The companion algorithm for detection

AC

and extraction of the watermark uses a cooperative neural network, where the suspected watermark signal is used as the input while the output consists in the result of the recognition process. The simulation experiments show that the algorithm can complete certain image processing operations with improved performance, not only simultaneously completing watermark detection and extraction, but also efficiently determining the watermark attribution. Compared with other state-of-the-art models, the proposed model obtains an optimal Peak Signal-to-noise ratio (PSNR). Keywords: Convolutional neural network; image watermark; generation scenario; algorithm design; smart cities

ACCEPTED MANUSCRIPT 1. Introduction With the rapid development of electronic information technology and of the design and production of chips in the nanometer era, computing power and speed have been improved. However, the current and expected demands are infinite, since computers perform ever more services in our life than ever, and these require that the computer itself perceives and recognizes the environment in a human-like way and makes correct judgments in complex situations. Picture information is the most intuitive and easily accessible information in our environment. It requires the computer to recognize and judge the

CR IP T

environment. It also requires the computer to recognize image information intelligently. Deep learning is a new research field in machine learning [17, 40], and is used to construct a deep network to extract target features and then identify the surrounding environment. The convolutional neural network has the advantages of translation, rotation, distortion and invariance in image processing [15, 41]. Dealing with images is, therefore, faster and more convenient. The convolutional neural network has dramatically improved the ability of the computer to recognize the surrounding environment, making

AN US

the computer more intelligent. Convolutional neural networks have powerful feature extraction capabilities, leading to important applications in image classification, recognition, target tracking and other fields.

In 1986, Rumelhart and McClelland proposed the Back Propagation (BP) algorithm, which is used to train the neural network for the output error of the inverse conduction neural network. Using the BP algorithm, the neural network can learn relevant statistical information from a suitable amount of

M

training data, and the mathematical information learned can reflect the function mapping relation of the

ED

input-output data model.

In 2006, Hinton proposed a deep belief network. Since then, deep learning has been widely used in academic circles. Deep learning not only changes traditional methods of machine learning but also

PT

affects our understanding of human perception. To date, it has resulted in breakthroughs in the fields of speech recognition and image understanding. Various related algorithms and models have made significant advances, and deep learning is now widely used in image classification, speech recognition,

CE

natural language processing and other fields [8, 11, 14]. In 2013, Baidu Research founded its Institute of Deep Learning. Under the guidance of artificial

AC

intelligence expert Wu Enda, Baidu Research has launched a series of artificial intelligence products: unmanned technology, DuerOS voice interactive computing platform, face recognition technology, Miller medical products and other outstanding products [19]. In addition, the ImageNet image recognition competition has also produced a series of classic neural network structures, such as VGG, Fast R-CNN, SPP-net and others. It can be said that artificial intelligence technology has achieved unprecedented development in recent years. Figure 1 shows the learning model in neural networks.

ACCEPTED MANUSCRIPT

AN US

CR IP T

(a) Shallow learning model without hidden layer

(b) Deep learning model with multiple hidden layers Figure 1. Learning models in neural networks

M

At present, the standard technology of deep learning uses target recognition through feature expression and classifiers. Many achievements have been made in speech recognition, image

ED

processing, machine translation and other fields [16, 18]. The structure of a convolutional neural network is more complicated than that of a traditional neural network. The convolutional neural network’s hidden layer contains much adjacent sampling or

PT

convolution with local receptive fields of all nuclear links, so the training parameters of the convolutional neural network not only reduce the degree of the neural network (which is complicated

CE

to test), but also reduce the risk of overfitting the training parameters of the neural network [3, 38]. Convolutional neural networks consist of two main parts: the convolution kernel and the down sampling kernel. The convolution kernel mainly processes the upper-layer image to extract the image

AC

features, and the lower sampling kernel reduces the upper-level data to reduce the complexity of the neural networks [5, 21, 22]. In the convolutional neural networks, each receptive neuron field extracts local features, such as

image contour, color and other characteristics (including conventional human characteristics), and neural networks identify the feature extraction and image in such a way that the features are independent of the position. Against this background, this paper presents a novel CNN-based security-guaranteed image watermarking generation scenario for smart city applications. In the current anti-geometric attack watermarking algorithm, the content-based watermark synchronization scheme identifies watermark

ACCEPTED MANUSCRIPT embedding positions using stable image feature points, embeds the watermark in a local neighborhood feature point and locates the watermark using the feature points. Hence, these methods show improved robustness. The gradient direction distribution of the neighboring pixels of a feature point is used to specify the direction information for each feature point. In practice, sampling is performed in a neighborhood window centered on the feature points, and a histogram is used to calculate the gradient directions of the neighboring pixels. The peak of the histogram represents the main direction of the neighborhood gradient at the feature point, that is, the main direction of the feature point. With the integration of the CNN, the model achieves better robustness [26, 27, 28, 29, 43].

CR IP T

The rest of the paper is organized as follows. In section 2, we introduce the convolutional neural network. In section 3, we discuss the proposed watermark embedding and extraction method and in section 4, the experiments and simulations are presented. Finally, in section 5, we summarize the work.

2. Convolutional Neural Networks

AN US

2.1 Region-Based Convolutional Neural Network

R-CNN is essentially an image processing and precise candidate trimming method that uses supervised pretraining and regional specialization of location information for object detection. It replaced the traditional unsupervised and supervised fine-tuning for pretraining [2, 42, 46]. In CNN, the input connection layer has a fixed size, so the computer vision algorithms will divide

M

each picture into 1,000-2,000 candidate regions. The images in these candidate regions are transformed in order to generate candidate images of fixed size, and commonly used pretraining model parameters

ED

are used in feature extraction training. In order to increase the number of training samples, the model also generates a candidate frame and a calibration label for training samples [7, 20]. R-CNN uses SVM to classify the feature vectors. When training the SVM, the candidate box is input to the SVM classifier,

PT

and the classifier is trained using the features of the convolutional neural network and the SVM calibration results. In testing, all the candidate frames of the image are extracted into the SVM

CE

classifier using the characteristics of the convolutional neural network, and the scores of each class are obtained [30, 33]. However, in the processing of a picture, R-CNN must deal with a picture image with 1,000-2,000 candidate regions. Before the operation, all the values of the selected picture preservation

AC

characteristics require storage space, thus requiring large amounts of computing hardware and increasing the processing time for each picture. The time cost of the R-CNN calculation is enormous; it cannot achieve real-time calculation results. When R-CNN processes the candidate area, it distorts the image and part of the information is lost.

2.2 Fast R-CNN Fast R-CNN is a modified target-tracking localization algorithm based on the convolutional neural network. Compared to R-CNN, Fast R-CNN changes from a single input to a dual input, with two

ACCEPTED MANUSCRIPT outputs after the full connection layer, and the RoI layer is introduced. Fast R-CNN has the same runtime for generating the candidate region, using a convolutional neural network for the original images for feature extraction, for candidate feature coordinates and for generating original images extracted into the RoI layer, so that each candidate region creates a fixed-size feature vector. Fast RCNN has dramatically improved the processing speed, but there are still bottlenecks in the calculation of candidate areas, which is one factor limiting the speed of Fast R-CNN [6, 10, 13]. Figure 2 shows

AN US

CR IP T

sketch maps of two CNNs.

ED

M

(a) Sketch map of R-CNN

Figure 2. Sketch maps of two CNNs

CE

PT

(b) Sketch map of Fast R-CNN

2.3 Deep Learning Network Model

AC

Image super-resolution based on deep learning relies on an external library to obtain the corresponding prior information for the deep learning neural network to realize the image superresolution [4, 23, 31, 34, 39, 44, 45]. The convolutional neural network can effectively reduce the training parameters of the network

and simplify the neural network. It also has high adaptability. Among CNNs, the most representative is the super-resolution convolutional neural network, where the multilayer consists of three layers: feature extraction, nonlinear mapping and high-resolution image reconstruction. Feature extraction is the process of extracting blocks from an initial image to obtain a feature map of the input image, as follows:

ACCEPTED MANUSCRIPT F1 (Y )  max(0,W1  Y  B1 )

(1)

where Y represents an initial high-resolution image,  represents the convolution operation, W1 is the convolution kernel and B1 is the neuron bias vector. Nonlinear mapping transforms feature vectors from low-resolution space to high-resolution space. The process can be represented by Equation 2.

F2 (Y )  max[0,W2  F1 (Y )  B2 ]

CR IP T

(2)

High-resolution image reconstruction uses the block-based output feature graph to generate the final high-resolution image:

F (Y )  W3  F2 (Y )  B3

(3)

AN US

where W3 can be regarded as a mean filter and the whole reconstruction process is a linear operation process [32].

The SRCNN parameter can be expressed as:

  {W1 ,W2 ,W3 , B1 , B2 , B3 }

(4)

M

The training process of the whole network estimates and optimizes the parameters. The meansquare error can be calculated as follows:

1 n || F (Yi ; )  X i ||2 n i 1

ED

L() 

(5)

PT

2.4 An Improved Image Algorithm Based on the Convolutional Neural Network The VGG group of the University of Oxford applied convolutional neural networks to image

CE

classification and visual recognition and achieved satisfactory results. Their entry was among the best in the ImageNet competition. In the convolutional neural network, we use the algorithm proposed by

AC

the VGG group to achieve an improved image algorithm based on the convolutional neural network. 2.4.1

Convolution Kernel Size

The three-layer convolution of SRCNN is 9-1-5 model and the improved algorithm is a 3-3-3

model. To assess the performance of the SRCNN algorithm, we also use three stacked layers, but the size of the convolution cores in each layer is 3 * 3. To highlight the effectiveness of the improved algorithm in comparison with the SRCNN algorithm, a series of training and learning processes were carried out, and the results of various network learning models were obtained [12]. According to the super-resolution processing of the SRCNN algorithm and the improved algorithm with the same number of iterations under two different network learning models, we obtained the corresponding PSNR and SSIM values. In order to test the image of a butterfly map, for example, a network model

ACCEPTED MANUSCRIPT was tested with different numbers of iterations and the final changes of the trends of PSNR and SSIM

CR IP T

were recorded. In Figures 3 and 4, we present the PSNR and SSIM for various algorithms.

AN US

Figure 3. PSNR for various algorithms

M

Figure 4. SSIM for various algorithms

With regard to both PSNR and SSIM, the improved algorithm is superior to the SRCNN algorithm and the BI algorithm. Compared with the SRCNN algorithm, the improved algorithm achieves the best

ED

effect in the training process, and the convergence speed is faster [24, 25]. The result shows that the improved algorithm can produce a better super-resolution effect. Compared with the SRCNN algorithm, it can achieve better results with fewer iterations, which can significantly reduce training time. The

PT

computational complexity of the improved algorithm is slightly higher than that of SRCNN for the

CE

same number of iterations.

2.5 Hash-Function Learning Based on a Deep Neural Network

AC

The goal of supervised hash learning is to learn q hash functions from the image training set, where q is the length of the hash encoding. The current mainstream hash algorithm is used in image retrieval and achieves good results, but there are some disadvantages. These methods are based on constructing artificial image features, e.g., the GIST method, image feature extraction and subsequent hash function learning. If the feature extraction in the first step does not preserve the similarities between images, then the learning of the subsequent hash function cannot satisfy the requirement. Therefore, the question of how to encode a picture into image features which are beneficial to hashfunction learning has become an important problem for research into hash functions in image retrieval. To illustrate, we first define the image similarity matrix in the hashing algorithm, as follows:

ACCEPTED MANUSCRIPT 1 Sij   1

(6)

For a given training set of hash functions, we can construct the similarity matrix according to the information of the image set. The training of hash functions based on neural networks requires that each image in the training set has corresponding hash codes, and hence the hash encoding of the training set is the basis of the training of the hash function.

functions: n

n

min  ( Sij  H

i 1 j 1

1 1 H .H Tj .)2  min || S  HH T ||2F q q

subject , to : H {1,1}nq

CR IP T

We can learn the hash coding of the image training set by minimizing the following objective

(7)

n

n

AN US

1 min || S  HH T ||2F  min || H . j H Tj  (qS   H .c H cT ) ||2F H q c j

 min  ( H ljH hj  Rlk )2 H

l 1 k 1

(8)

M

For ease of derivation and description, we define R as follows: R  qS   H c H cT

(9)

ED

c j

Since the similarity matrix S is a symmetric matrix, we can deduce that R is also a symmetric

PT

matrix. Thus, we have: n

n

min g ( H ij )   ( H lj H kj  Rlk )2  ( H ij2  Rii )2  2 ( H ij H kj  Rik )2  cons tan t l 1 k 1

(10)

k i

CE

Hij

Using the fact that matrix R is a symmetric matrix, the corresponding optimization problem is

AC

given by:

min g ( H ij  d ) d

subject , to : 1  Hij  d  1

(11)

To facilitate the search for d, a Taylor expansion is used: g ( H ij  d )  g ( H ij )  g '( H ij )d 

1 g ''( H ij )d 2 2

(12)

ACCEPTED MANUSCRIPT n

g '( H ij )  4 ( H ij H kj  Rik )H kj

(13)

g ''( H ij )  12H ij2  4Rii  4 H kj2

(14)

k 1

k i

n

g '( H ij ) g ''( H ij )



k 1

(15)

12 H ij2  4 Rii  4 H kj2 k i

CR IP T

d 

4 ( H ij H kj  Rik )H kj

Neural networks have been applied in many fields and have achieved remarkable results. In the field of image recognition, if a pixel of the image is directly fed into the neural network, this has obvious shortcomings. The image is susceptible to translation, scale, tilt, rotation and other deformations. Because of the above disadvantages, much research on image recognition involves designing image classification algorithms based on artificial features. The specific process is that of

AN US

image preprocessing; feature extraction is used to represent the image, using features such as GIST, SIFT, BOG and others, and then the features are regarded as the input of a machine learning classification algorithm. Depending on the problem, it is often necessary to manually formulate different features or to combine multiple features to improve the efficiency. These artificial features are often not generic and require considerable time and expertise to acquire.

CNNs belong to the Hubel-Wiesel architecture, based on Hubel and Wiesel’s research on the

M

primary visual cortex of cats in 1962. The Japanese scholar Fukushima proposed the first computer simulation model based on the concept of the neocognitron, but this model did not use a global

ED

supervised training algorithm. Later, LeCun et al. constructed the global supervised algorithm using the backpropagation algorithm to train the whole model, and then used the model to achieve a significant

PT

advance in digital recognition.

For hash function learning, the proposed method is different from the supervised hashing learning method using the artificial features of the image as input. The hash algorithm in this paper uses the

CE

original image as input in the second stage and uses the hash code of the training set learned in the first

AC

stage to learn the target hash function.

2.6 BP Neural Network The BP network is a kind of feedforward neural network. The training input-output nuclear samples can be realized from any nonlinear mapping from input to output; the essence of the method is to achieve a nonlinear mapping relation for the steepest descent. The approximate mean-square error of the steepest descent algorithm is as follows: ^

F w (k  1)  w (k )   m wi , j m i, j

m i, j

(16)

ACCEPTED MANUSCRIPT ^

F b (k  1)  b (k )   m bi , j m i, j

m i, j

(17)

^

where  represents the learning rate and F is the expectation value of the mean-square error. A neural network is a model that mimics the transmission and processing of information in animal neurons. A complex network structure is interconnected by simple processing units (neurons) and the whole neural network is a complex nonlinear system. The transformation process can be described as

CR IP T

follows.

g ( x)  b  wh(c  vx)

(18)

AN US

The general structure of the neural network is presented in Figure 5.

M

Figure 5. Structure of neural network

The neuron is the basic unit of the neural network; each neuron is a multi-input and single-output

(19)

AC

CE

PT

d  u   i  wij x j  bi j 1   z  h(u ) i  i

ED

information processing unit [22]. The processing of neurons can be expressed as:

Figure 6. Neural network model and the process of single-neuron information processing The neural network is a hierarchical directed graph as shown in Figure 6. There is no connection between the nodes of the same layer, therefore these nodes cannot be connected with each other. The upper-layer input is transmitted to the lower neurons via a nonlinear transformation. The number of hidden layers, the number of neurons in each layer and the choice of nonlinear functions are key to the

ACCEPTED MANUSCRIPT formation of neural networks. The neural network uses the BP algorithm to learn statistical rules from a large number of training samples, to predict unknown events. A neural network with fewer hidden layers is called a shallow model. It has a limited number of computational units and limited representation of features. To date, the neural network has been developed for deep models by constructing a number of hidden layers using training data, enabling it to automatically learn more useful features and enhance the accuracy of the final classification or prediction. 2.6.1

General Construction and Properties of Convolutional Neural Networks

CR IP T

With the improvement of hardware performance and the optimization of the algorithm, convolutional neural network applications have developed from simple classification tasks to a level that can exceed human recognition ability. Different network structures are often needed for various classification tasks. For simple classification tasks, fewer convolution, pooling and nonlinear layers are used, and each layer uses fewer convolution kernels to extract the different characteristics of the various classes. For more problematic classification tasks, the structure of the neural network is more

AN US

complicated and the network parameters are more significant.

When the training samples are insufficient and there are too many network parameters, the model is subject to overfitting. To avoid the overfitting phenomenon when building a network, the regularization functions known as ‘Early stopping’, ‘Dropout’ and so on are usually added to the loss function. Temporary dropout of the hidden layer neurons from the network occurs with a certain probability, and

M

the weight coefficient of these neurons is not updated, though the weight is still retained. The development of GPUs led to an increase in computing power. To make full use of the robust

ED

computing capability of GPUs, the neural network is trained using a batch gradient descent stochastic gradient instead of gradient descent, and the original sample is reduced to a random sample which uses some form of gradient for fine-tuning of the parameters. When a small number of samples contain

PT

more images, the training required for the GPU memory will increase. The randomness of each small image is chosen so that the network can always converge. Compared with the random gradient descent

CE

method, using single-sample batch gradient descent can increase the quantity of data processed each time, improve the utilization ratio of the GPU and enhance the efficiency of the training process. The computational overhead is reduced, and the training time is shortened, compared with the gradient

AC

descent method which uses all the sample update parameters at once. The method of stochastic gradient descent with the momentum parameter can result in the network converging faster. The training speed and model accuracy of the convolutional neural network are influenced by many

factors. Training data often show considerable numerical differences, leading to training process error decline, instability, slow network learning rate or even convergence. The data reduction, Z-score normalization and whitening operations can eliminate the differences in numerical values between different feature components and improve the learning performance of the network. The mean reduction is calculated as follows.

ACCEPTED MANUSCRIPT 1 n m s i,q   xq  nms  x(i , k ) , q  ( R, G, B) i 1 j 1 k 1   xi , q  xq, j  {1, 2,..., m}, k  {1, 2,..., s}  (i , k )

(20)

2.7 The Fundamental Principles of Convolutional Neural Networks The classical convolutional neural network is composed of the input layer, the convolutional layer,

CR IP T

the lower sampling pattern (pool layer), all link layers and an output layer. The input of the convolutional neural network is usually the original image, denoted by:

X i  F ( X i 1 *Wi  bi )

(21)

The lower sampling layer usually follows the stacked layer and samples the feature graph according to some lower sampling rules. The lower sampling layer has two primary functions: reducing

AN US

the dimensionality of the feature map and maintaining the scale invariance characteristics of the features to a certain extent.

The first stage is forward propagation. The information is transmitted progressively from the input layer to the output layer. This process also trains the network after the normal operation of the

M

implementation process. In this process, the network executes the following computation process.

Op  Fn (...( F2 ( F1 ( X 0W1  bi )W2  b2 )...)Wn  bn )

(22)

ED

The second stage is backward propagation. The training objective of the convolutional neural network is to minimize the loss function of the network. The difference between the calculated and expected values of the loss function after forward conduction is called the residual error. The collective

PT

loss functions include the mean-square error function and the negative logarithmic likelihood function. In the training process, the most commonly used optimization method for convolutional neural

CE

networks is the gradient descent method. The residuals propagate through gradient descent and update the training parameters of each layer of the convolutional neural network, layer by layer. Learning rate

AC

parameters are used to control the intensity of residual backpropagation.



E (W , b)  L( w, b)  W T W 2

Wi  Wi 1  

bi  bi 1  

E ( w, b) Wi 1

E ( w, b) bi 1

(23)

(24)

(25)

ACCEPTED MANUSCRIPT The basic convolution filter of convolutional neural networks is a generalized linear model. The extraction of separable linear instances of implicit concepts is good. At present, there are two improvements that can be made to rollover layers to improve the filter representation capability. Min, Lin and others pay considerable attention to the abstract features of the improved network structure. Their proposed model is the ‘Network in Network’ model using an mlpconv layer and global mean pooling, which represents the innovation in this model. The mlpconv layer can be considered as the local receptive field of each convolution, and also contains a micro multilayer network. The use of micro network layers results in more complex operations in each of the local receptive fields of

CR IP T

neurons; the model replaces the abstract model and has a stronger ability to enhance the expression ability of traditional CNN. The advantages of MLP are: very efficient use of the function approximator, ability to be trained by the BP algorithm, ability to be entirely integrated into CNN, the fact that it is itself a deep model and the fact that it can be characterized by reuse. Unlike the traditional full connectivity layer, the model pools the global means of each feature map of a whole picture, so that each feature graph results in an output. It uses mean pooling and parameter reduction and can

AN US

significantly reduce the network and avoid overfitting.

The ‘Inception module’ proposed by Szegedy uses a variety of filter sizes to capture the different sizes of different visualization modes and approaches the optimal sparse structure. In particular, the Inception module consists of a pooling operation and three convolution operations. The convolution is placed before and before convolution as a dimension descent module, increasing the depth and width of

M

the CNN without increasing the computational complexity.

The neural network algorithm uses a multilayer perceptron network consisting of three layers of

ED

neurons (perceptrons). This is also called the Delta rule in a backpropagation network. The layers are the input layer, the optional hidden layer, and the output layer. In a multilayer perceptron network, each

PT

neuron receives one or more inputs and produces one or more outputs. Each output is a simple nonlinear function of the sum of the inputs to the neuron. The input value is passed from the node in the input layer to the node in the hidden layer and finally to the output layer. There is no connection

CE

between neurons in the same layer. A mining model constructed using a neural network algorithm can contain multiple networks,

AC

depending on the number of columns used for input and prediction, or depending on the number of columns used for prediction only. The number of networks that a mining model contains depends on the number of states in which the input column and the predicted column are used by the mining model. Input neurons provide input attribute values for data mining models. For discrete input attributes, the input neuron usually represents a single state of the input property, including the missing value. For example, a binary input property generates an input node that indicates the missing or existing state and indicates whether the property has a value. The Boolean value, which is used as the input attribute, generates three input neurons: one neuron for the True value, one neuron for the False value and one neuron for missing or existing states. A discrete input property with more than two states can generate

ACCEPTED MANUSCRIPT an input neuron for each state and also generate an input neuron for the missing or existing state. A continuous input attribute can generate two input neurons. The input neuron can provide input to one or more hidden neurons. The hidden neuron receives input from the input neuron and provides output to the output neuron. The output neuron represents the value of the predictable property of the data mining model. For discrete input attributes, the output neuron typically represents a single prediction state for predictable properties, including missing values. For example, a binary attribute can generate an output node that

CR IP T

describes the missing or existing state to indicate whether the property has a value. The applications of the neural network algorithm are related to regression tasks and classification tasks.

2.8 Regularization of Neural Networks

AN US

Overfitting is a common problem in many machine learning scenarios. Researchers improve generalization by improving the sparsity and randomness of the network. The ‘Dropout’ model proposed by Hinton et al., in the process of randomization, in practice neglects lower response rates of nodes to reduce overfitting of the fully connected network in the previous problems and to enhance the network-wide performance. For Dropout application in the connection layer, which is the output of the feature extractor, the size of the fully connected weight matrix is a nonlinear activation function that is

M

also the size of the binary mask element of the Bernoulli distribution.

For the ‘DropConnect’ model, instead of setting the output of neurons in the forward conduction,

ED

there is a certain probability that some input neurons do not work. In the course of BP training, the working neurons apparently do not make an error contribution. In addition, the training process covers the error. The difference between Dropout and DropConnect is that if the output does not work, then

CE

little stronger.

PT

the output cannot form an input for the next level. However, DropConnect’s generalization ability is a

2.9 Improvement of Activation Function

AC

In the artificial neural network, the activation function of the neuron node defines the mapping of the output of the neuron. In a nutshell, the output of the neuron is treated as an output after the activation function is processed. At present, most of these functions are piecewise linear and exponential nonlinear functions. In Figure 7, we present two activation functions.

ACCEPTED MANUSCRIPT

AN US

CR IP T

(a) Sigmoid function

(b) Tanh function

Figure 7. Two activation functions

M

The sigmoid function has been used in the past, but less so in recent years. In addition to the above S-type functions, the ReLU function is also a commonly used activation function. ReLU is a piecewise

AC

CE

PT

ED

linear function, presented in Figure 8.

2.10

Figure 8. ReLU and LReLU/PReLU

Pyramid Convolutional Neural Network

After several years of development and much experience, regulating the construction and parameter of CNN framework has become a frequently debated issue. However, with the development of computer storage and transmission technology, the resolutions of images have increased, further

ACCEPTED MANUSCRIPT slowing the CNN training speed. Although GPUs have been greatly improved, the rate of improvement still cannot meet the needs of research and applications. There is therefore a need to improve the training speed of CNN further, starting from the training characteristics of CNN. When TCNN is trained, most of the time is spent on filtering the convolution in the larger image. The parameter adjustment is carried out in the whole network. Network training must go through many forward and reverse transmission parameter adjustment processes and the input image sizes in front of several layers of the network are relatively large, thus, most of the TCNN training time is wasted in

CR IP T

convolution filtering of large-sized images. From the point of view of the structure of CNN, ensuring that the CNN training process is only used in small-sized images or on the large-scale image with one transmission process, will significantly reduce the training time and help to adjust the connection parameters. Using the PCNN training method, we can ensure that TCNN training takes place only on small-sized image blocks, and for large-sized images the TCNN is no longer training but is only using the shared weights W and pooled convolution

AN US

prior to the transmission operation. This will greatly improve the training speed of the network.

Color is a powerful description which can often be used to distinguish between objects. These can then be simplified and extracted from the scene. While human beings can identify thousands of colors with different hues and brightness, they can only identify dozens of grayscale features, and therefore a color image is preferred. Most TCNNs now deal with color images only on the first level using color

M

factors, and the same filter is used in the second and deeper layers, leading to some color features being partially missing. With the development of computer storage and transmission technology, the resolution of color images or multispectral images is becoming more and more important for direct

ED

processing of color or multispectral images, as it can greatly increase the computer memory required

PT

for the algorithm, resulting in a decreased training speed.

CE

3 Proposed Watermark Embedding and Extraction Method In general, the watermark extraction process is the inverse of the watermark embedding process, using the original host image. The extraction algorithm used in this study extracts the watermark using

AC

the neural network and does not need to use the original host image. This is therefore called blind watermark detection. Digital watermarking algorithms usually include watermark embedding and watermark detection or extraction. A watermark can be composed of many models, such as a random number sequence, digital identification, text or an image, and we often need to randomize and encrypt the watermark. Assume c represents the original carrier signal, m represents the watermark signal and K is code. The watermark embedding process is as follows:

cw  Ec (c, F (c, m, K ))

(25)

ACCEPTED MANUSCRIPT The watermark detection and extraction process can be represented by: ^

^

m  De (cw , m, c, K )

(26)

3.1 Watermark Sequence Generation Meaningful watermark signals represent a meaningful text, sound, image or video signal. Using the meaningful watermark signal, the extracted watermark can be intuitively and directly identified if the watermark is contained in the vector. To ensure the reliability and security of the watermark, the

CR IP T

meaningful watermark signal should be encrypted as a watermark signal sequence.

To ensure the security of the watermark pattern, it is randomly scrambled before being embedded so that it cannot be recognized by an illegal attacker. The original watermark image and the scrambled watermark image are denoted W and W* respectively.

(27)

AN US

W *  {wi*, j  wi ', j ' |1  i, i '  m;1  j, j '  m}

Firstly, the watermark is scanned into a one-dimensional vector and then labeled. A random permutation is generated according to the key, and the position of each vector element is readjusted according to the random arrangement. The vector is then reduced to a two-dimensional matrix, representing the scrambled watermark. In Figures 9 and 10, we show the watermark embedded into the

Figure 9. Watermark embedded into HH2 subgraph

AC

CE

PT

ED

M

HH2 subgraph and the rough scale subgraph A, B, C, D respectively.

Figure 10. Rough scale subgraph A, B, C, D Because of the low-pass and high-pass filtering of the multiwavelet transform in both matrices, we need to preprocess the scalar sequence into the vector sequence. The pretreatment method used

ACCEPTED MANUSCRIPT involves duplicating the rows of the image, multiwavelet decomposition, and the introduction of four subgraphs on a coarse scale, recorded as A, B, C, D. The mean map of the rough scale subgraph is calculated by:

X  {( xi , j ) | xi , j  (ai , j  bi , j  ci , j ) / 3} ai , j  A, bi , j  B, ci , j  C

(28)

CR IP T

Embedding the scrambled watermark bit into the two-valued watermark image by modifying di , j and xi , j leads to:

max(di , j , xi , j   ), ifwi , j *  1 di , j '   min(di , j , xi , j   ), else

(29)

AN US

where  is the system parameter with a value determined by the user. The greater the value of  , the stronger the robustness, but the distortion will also increase.

3.2 Adaptive Extraction Algorithm Based on the Neural Network

The watermark is extracted according to the sign of  i , j when subjected to various attacks such as

M

a slight distortion of the watermark image. When the watermarked image is distorted, the decision function  i , j is more effective, therefore, the judgment function needs to be adapted to the various

~

~

~

~

~

PT

 ai , j  d i , j  ai , j

ED

attacks. The relation model between watermarks can be expressed with respect to training samples.

~

 b i , j  d i , j  bi , j ~

~

(30)

(31)

~

CE

 ci , j  d i , j  ci , j

(32)

~

AC

In Equation 33, t i , j represents the desired output for the neural network:

~

t i, j

 ~  xi , j / k , ifwi , j  1  ~  xi , j / k , else  ~

~

~

(33)

~

  {( ai , j ,  bi , j ,  ci , j , t i , j ) |1  i  p,1  j  q}

(34)

During the transmission and use of the watermarked image carrier, the image will be modified and attacked to a certain extent. Therefore, in processing and in attacks over a certain intensity, the direct extraction of a meaningful watermark can be ambiguous. In the image carrier, the method of

ACCEPTED MANUSCRIPT detecting and extracting the watermark can be identified, and a pattern recognition method based on the synergetic neural network can be used to recognize the watermark. Since the cooperative ordinal parameter represents a matching measure between the prototype model and the test pattern, the pattern assignment of the input watermark can be identified by learning and training in the cooperative neural network.

3.3 Adaptive Watermark Embedding Algorithm The core of the adaptive watermark embedding algorithm proposed in this paper is to

CR IP T

automatically obtain the embedding intensity factor of the watermark. Proper embedding strength can therefore balance the robustness and imperceptibility of watermarked images. The embedding intensity factor is set by experience, and therefore it is difficult to evaluate its rationality objectively. The peak signal-to-noise ratio can be used to measure the difference between the original host image and the watermarked image, i.e., the influence of the watermark on the original image. The larger the PSNR value, the smaller the impact on the image. According to human visual characteristics, if the PSNR of

eye.

PSNR  10log10

AN US

the embedded watermark is greater than a certain threshold value, then it is imperceptible to the human

3* M * N *(255) 2  [ I R,G, B (i, j)  I 'R,G, B (i, j)]2

3 m ,3 n

M

3.4 Watermark Detection Method

(35)

We can use the method of the similarity degree of intuitive observations to detect the extracted

ED

watermark and the original watermark, but this method is easily affected by experience, experimental conditions and other factors, and therefore it is necessary to perform a correlation test.

PT

Assume the original watermark is w(i, j), the extracted watermark is w’(i, j) and the watermark size is P*Q. The normalized correlation is as follows:

CE

 w(i, j )w '(i, j )

NC 

P ,Q

 w (i, j ) 2

(36)

AC

P ,Q

If there is a watermark in the image, the watermark threshold Tn should be set in advance. If

NC  Tn , a watermark is thought to exist; otherwise it does not exist.

3.5 Applications in Smart Cities Using the Proposed Method The previous sections introduced the proposed method and the related background. This subsection will analyze further applications for the information security model (vision specialized) in the construction of smart cities. Information security risks, or vulnerability to basic natural threats in the utilization of information systems and their management, result in the occurrence of security

ACCEPTED MANUSCRIPT incidents with consequent impacts on organizations. Although the terminals used to access smart city data do not belong to the category of smart city construction, when many user terminals are required, smart city project-related management and technical personnel must consider the impact of the smart city system on these terminals. Information security risk assessment is based on the information security technology and management standards, and on the information system and its processing, transmission and storage of information security properties such as confidentiality, integrity and availability of evaluation processes. To assess threats to assets and then threaten the possibility of using vulnerability lead to security incidents that combined with the security incident involved in judging the

CR IP T

value of the assets security events in the event of the effects on the organization, and put forward targeted against the threat of protective measures and corrective actions. Information security risk assessment attempts to guard against information security risk or to reduce it to an acceptable level as far as possible given the scientific basis, to protect the safety of the network and the information. Therefore, the following aspects should be emphasized. (1) Because the responsibility for smart city security belongs to its management department, some departments believe that leaks can only be

AN US

prevented by not sharing. However, data sharing technology has been mature for some time in the area of safe credit. Data sharing is necessary in order to provide more convenient services for users, and data sharing is also the general trend in smart cities. (2) Some departments consider the data in the systems they manage to be of value only to their own business and fail to recognize the value of their data to other sectors or businesses and to citizens. (3) Some smart city information systems lack unified management processes and norms, resulting in poor quality of collected data or inconsistent quality of

ED

4. Experiments and Simulations

M

data collected at different times and places or by different operators, resulting in more data conflicts.

In this section, we describe the experimental simulations using the proposed algorithm. Figure 11

AC

CE

PT

shows sample sets of the simulation images.

Figure 11. Sample sets of the simulation images Transparency, robustness, and security are three important indicators for evaluating the quality of image watermarking. Transparency refers to the embedding of the watermark and does not affect the

ACCEPTED MANUSCRIPT normal use of the image. Since it does not cause a reduction in the visual quality of the image, the existence of watermarks in the image cannot be detected. Digital watermarks should be suitable and should not affect the normal use of protected data. Robustness refers to the ability to detect a watermark after a conventional signal processing operation. Conventional operations for images include spatial filtering, JPEG compression, clipping attacks, printing and copying, geometric deformations and noise attacks. In some cases, robustness is useless or even avoided, since fragile watermarking requires that any signal processing operations on the image destroy the watermark. Watermark embedding, embedding intensity, image size and other factors all influence the robustness

CR IP T

of the watermark. The greater the embedding of watermarks, the better the robustness, but the worse the transparency. Security refers to the ability of watermarks to resist the behavior of various watermarking functions, i.e., unauthorized persons cannot remove, embed or detect watermarks. At the same time, the watermark information should be difficult to copy or fake. In our experiments, we use PSNR as the judging standard. In Table 1, we show the PSNR simulation data for the proposed method on the image data set. In Tables 2 and 3, we present two sets of results comparing the proposed method

AN US

with those in [1], [9], [35] and [37]. In Table 4, we present the comparison of the execution time for the different methods. It can be concluded from the simulation results that the proposed model performs better than the others.

Table 1. PSNR for the proposed method on the image data set PSNR

PSNR-JND

50.12

30.45

49.56

30.63

House1

49.85

28.52

House2

49.46

28.51

House3

49.53

30.56

House4

50.62

30.45

Pirate

50.15

29.32

Living room

49.78

28.56

Pentagon

48.95

30.89

Kiel

47.63

30.56

Lighthouse

47.75

31.43

Peppers

48.41

30.52

Lake

48.20

29.49

Barbara

AC

CE

PT

ED

Boat

M

Host image

Table 2. The simulation comparison results: set 1 Host image/PSNR

Proposed

[37]

[35]

[9]

[1]

ACCEPTED MANUSCRIPT 50.12

48.46

46.46

47.35

46.88

Boat

49.56

47.62

48.75

49.07

49.15

House1

49.85

48.53

48.62

48.96

48.85

House2

49.46

47.98

46.95

47.56

47.01

House3

49.53

47.51

45.42

49.23

49.11

House4

50.62

47.42

45.03

49.83

46.74

Pirate

50.15

47.56

46.26

49.33

48.85

Living room

49.78

47.23

45.53

48.26

48.50

Pentagon

48.95

47.56

46.98

Kiel

47.63

47.78

45.75

Lighthouse

47.75

47.56

45.45

Peppers

48.41

47.89

48.26

Lake

48.20

48.51

47.41

AN US

CR IP T

Barbara

47.12

48.30

45.96

46.72

47.01

46.59

47.23

47.70

47.33

46.95

Proposed

[37]

[35]

[9]

[1]

Barbara

50.56

49.48

48.95

47.47

46.45

Boat

50.52

49.56

48.86

47.56

46.86

House1

50.54

48.53

47.84

48.53

47.95

49.15

48.62

47.75

48.54

47.30

49.26

48.78

48.26

48.20

47.21

50.53

47.52

49.35

47.21

46.79

50.59

49.16

49.02

49.58

47.56

50.86

49.53

47.23

49.96

47.51

Pentagon

49.64

47.92

46.21

46.53

46.86

Kiel

50.75

48.62

48.45

46.54

49.95

AC

Table 3. The simulation comparison results: set 2 Host

Lighthouse

50.23

48.54

48.86

46.26

49.93

Peppers

50.07

48.75

48.95

46.51

46.95

Lake

49.45

47.89

47.87

46.57

46.82

ED

House2

Pirate

CE

Living room

PT

House3 House4

M

image/PSNR

Table 4. Comparison of execution time for different methods Execution time

Proposed

[37]

[35]

[9]

[1]

Embedding time

0.6815

0.7154

0.6979

0.7515

0.6912

ACCEPTED MANUSCRIPT Extraction time

0.5147

0.6238

0.6570

0.7032

0.5869

Total time

1.1962

1.3392

1.3549

1.4547

1.2781

5. Conclusion To meet the challenges of image security detection technology and adapt it to the current situation of diversity of media types, this paper uses the convolutional neural network to classify images according to the method of image generation. We use convolutional neural networks and pooling

CR IP T

operations to extract features of natural images, to document scanned images and to construct a highspeed and high-precision image type recognition system. There are three aspects to be highlighted regarding this topic. (1) Fast R-CNN is proposed. Compared to R-CNN, Fast R-CNN changes from a single input to a dual input, with two outputs after the full connection layer, and an RoI layer is introduced. (2) The convolution kernel size is finalized. The three-layer convolution of SRCNN is a 9-

is proposed.

AN US

1-5 model, and the improved algorithm is a 3-3-3 model. (3) An improvement in the activation function

The proposed classification method has a classification accuracy of over 93.75% for the general image library. By comparing experiments, it is shown that image preprocessing has a positive effect on the accuracy of the model and the time required for training convergence. We will further investigate

Acknowledgment

ED

M

the proposed model using other databases.

This study was financially supported by the Project of Macau Foundation (No. M1617): The First-

PT

phase Construction of Big-Data on Smart Macao. Also, this research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(No. 2017R1A6A1A03015496) and This work was supported by the National Research

CE

Foundation of Korea(NRF) grant funded by the Korea government(Ministry of Science and ICT) (No.

AC

2017R1E1A1A01077913).

References

1.

Badshah, G., Liew, S.C., Zain, J.M. and Ali, M., 2016. Watermark compression in medical image watermarking using Lempel-Ziv-Welch (LZW) lossless compression technique. Journal of digital imaging, 29(2), pp.216-225.

2.

Bakshi,

S.,

Sa,

P.K.,

Wang,

H.

https://doi.org/10.1007/s11042-017-4965-6.

et

al.

Multimed

Tools

Appl

(2017).

ACCEPTED MANUSCRIPT 3.

Boureau, Y.L., Ponce, J. and LeCun, Y., 2010. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 111-118).

4.

Cai, Z., Deng, L., Li, D. et al. Cluster Comput (2017). https://doi.org/10.1007/s10586-0171216-6.

5.

Cui, J., Liu, Y., Xu, Y., Zhao, H. and Zha, H., 2013. Tracking generic human motion via

cybernetics: systems, 43(4), pp.996-1002. 6.

Chen,

Q.,

Zhang,

G.,

Yang,

X.

et

al.

https://doi.org/10.1007/s11042-017-5299-0 7.

CR IP T

fusion of low-and high-dimensional approaches. IEEE transactions on systems, man, and

Multimed

Tools

Appl

(2017).

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. and Fei-Fei, L., 2009, June. Imagenet: A largescale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR

8.

AN US

2009. IEEE Conference on (pp. 248-255). IEEE.

Dahl, G.E., Sainath, T.N. and Hinton, G.E., 2013, May. Improving deep neural networks for LVCSR using rectified linear units and dropout. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8609-8613). IEEE. Etemad, E., Samavi, S., Reza Soroushmehr, S.M. et al. Multimed Tools Appl (2017).

M

9.

https://doi.org/10.1007/s11042-016-4278-1

ED

10. Girshick, R., Donahue, J., Darrell, T. and Malik, J., 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on

PT

computer vision and pattern recognition (pp. 580-587). 11. Hubel, D.H. and Wiesel, T.N., 1962. Receptive fields, binocular interaction and functional

CE

architecture in the cat's visual cortex. The Journal of physiology, 160(1), pp.106-154. 12. He, K., Zhang, X., Ren, S. and Sun, J., 2015. Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification. In Proceedings of the IEEE international

AC

conference on computer vision (pp. 1026-1034).

13. He, K., Zhang, X., Ren, S. and Sun, J., 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), pp.1904-1916. 14. Hinton, G.E. and Salakhutdinov, R.R., 2006. Reducing the dimensionality of data with neural networks. science, 313(5786), pp.504-507. 15. Jiao, J., Permuter, H.H., Zhao, L., Kim, Y.H. and Weissman, T., 2013. Universal estimation of directed information. IEEE Transactions on Information Theory, 59(10), pp.6220-6242.

ACCEPTED MANUSCRIPT 16. Jin, J., Fu, K. and Zhang, C., 2014. Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Transactions on Intelligent Transportation Systems, 15(5), pp.1991-2000. 17. Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105). 18. Kim, I.J. and Xie, X., 2015. Handwritten Hangul recognition using deep convolutional neural

CR IP T

networks. International Journal on Document Analysis and Recognition (IJDAR), 18(1), pp.113.

19. Lyu, S. and Farid, H., 2005. How realistic is photorealistic?. IEEE Transactions on Signal Processing, 53(2), pp.845-850.

20. Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., Cao, L. and Huang, T., 2011, June.

AN US

Large-scale image classification: fast feature extraction and svm training. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 1689-1696). IEEE. 21. Liu, Y., Zhang, L., Nie, L., Yan, Y. and Rosenblum, D.S., 2016, February. Fortune Teller: Predicting Your Career Path. In AAAI (pp. 201-207).

M

22. Liu, Y., Cui, J., Zhao, H. and Zha, H., 2012, November. Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In Pattern Recognition

ED

(ICPR), 2012 21st International Conference on (pp. 898-901). IEEE. 23. Lin, M., Chen, Q. and Yan, S., 2013. Network in network. arXiv preprint arXiv:1312.4400.

PT

24. Liu, Y., Zheng, Y., Liang, Y., Liu, S. and Rosenblum, D.S., 2016. Urban water quality prediction based on multi-task multi-view learning.

CE

25. Liu, Y., Liang, Y., Liu, S., Rosenblum, D.S. and Zheng, Y., 2016. Predicting urban water quality with ubiquitous data. arXiv preprint arXiv:1610.09462.

AC

26. Liu, Y., Nie, L., Han, L., Zhang, L. and Rosenblum, D.S., 2015, July. Action2Activity: Recognizing Complex Activities from Sensor Data. In IJCAI (pp. 1617-1623).

27. Liu, Y., Nie, L., Liu, L. and Rosenblum, D.S., 2016. From action to activity: Sensor-based activity recognition. Neurocomputing, 181, pp.108-115.

28. Liu, L., Cheng, L., Liu, Y., Jia, Y. and Rosenblum, D.S., 2016, February. Recognizing Complex Activities by a Probabilistic Interval-Based Model. In AAAI (Vol. 30, pp. 12661272).

ACCEPTED MANUSCRIPT 29. Lu, Y., Wei, Y., Liu, L., Zhong, J., Sun, L. and Liu, Y., 2017. Towards unsupervised physical activity recognition using smartphone accelerometers. Multimedia Tools and Applications, 76(8), pp.10701-10719. 30. Liang, R.Z., Shi, L., Wang, H., Meng, J., Wang, J.J.Y., Sun, Q. and Gu, Y., 2016, December. Optimizing top precision performance measure of content-based image retrieval by learning similarity function. In Pattern Recognition (ICPR), 2016 23rd International Conference on (pp. 2954-2958). IEEE.

CR IP T

31. Liu, Y., Zhang, X., Cui, J., Wu, C., Aghajan, H. and Zha, H., 2010, October. Visual analysis of child-adult interactive behaviors in video sequences. In Virtual Systems and Multimedia (VSMM), 2010 16th International Conference on (pp. 26-33). IEEE.

32. Mahendran, A. and Vedaldi, A., 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp.

AN US

5188-5196).

33. Preoţiuc-Pietro, D., Liu, Y., Hopkins, D. and Ungar, L., 2017. Beyond binary labels: political ideology prediction of twitter users. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 729-740). 34. Plageras, A.P., Psannis, K.E., Stergiou, C., Wang, H. and Gupta, B.B., 2017. Efficient IoT-

M

based sensor BIG Data collection-processing and analysis in smart buildings. Future Generation Computer Systems.

ED

35. Qian, H., Tian, L. and Li, C., 2016, August. Robust Blind Image Watermarking Algorithm Based On Singular Value Quantization. In Proceedings of the International Conference on

PT

Internet Multimedia Computing and Service (pp. 277-280). ACM. 36. Sutskever, I., Martens, J., Dahl, G. and Hinton, G., 2013, February. On the importance of initialization and momentum in deep learning. In International conference on machine

CE

learning (pp. 1139-1147).

37. Su, Q. and Chen, B., 2017. Robust color image watermarking technique in the spatial domain.

AC

Soft Computing, pp.1-16.

38. Scherer, D., M ü ller, A. and Behnke, S., 2010. Evaluation of pooling operations in convolutional architectures for object recognition. Artificial Neural Networks–ICANN 2010, pp.92-101. 39. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).

ACCEPTED MANUSCRIPT 40. Williams, D.R.G.H.R. and Hinton, G., 1986. Learning representations by back-propagating errors. Nature, 323(6088), pp.533-538. 41. Wang, Y. and Moulin, P., 2006, May. On discrimination between photorealistic and photographic images. In Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on (Vol. 2, pp. II-II). IEEE. 42. Wang, Y., Li, J. & Wang, H.H. Cluster Comput (2017). https://doi.org/10.1007/s10586-017-

CR IP T

1199-3. 43. Wang, H. and Wang, J., 2014, November. An effective image representation method using kernel classification. In Tools with Artificial Intelligence (ICTAI), 2014 IEEE 26th International Conference on (pp. 853-858). IEEE.

44. Zhu, J.Y., Krahenbuhl, P., Shechtman, E. and Efros, A.A., 2015. Learning a discriminative model for the perception of realism in composite images. In Proceedings of the IEEE

AN US

International Conference on Computer Vision (pp. 3943-3951).

45. Zareapoor M, Shamsolmoali P, Jain DK, Wang H, Yang J. Kernelized support vector machine with deep learning: An efficient approach for extreme multiclass dataset. Pattern Recognition Letters. 2017 Sep 9. S.,

Wang,

H.

&

Huang,

M

46. Zhang,

AC

CE

PT

ED

https://doi.org/10.1007/s10586-017-0859-7.

W.

Cluster

Comput

(2017)

20:

1517.