A hybrid convolution network for serial number recognition on banknotes

A Hybrid Convolution Network for Serial Number Recognition on Banknotes Journal Pre-proof A Hybrid Convolution Network for Serial Number Recognition...

Download PDF

808KB Sizes 0 Downloads 43 Views

Report

PDF Reader
Full Text

A Hybrid Convolution Network for Serial Number Recognition on Banknotes

Journal Pre-proof

A Hybrid Convolution Network for Serial Number Recognition on Banknotes Feng Wang, Huiqing Zhu, Wei Li, Kangshun Li PII: DOI: Reference:

S0020-0255(19)30921-1 https://doi.org/10.1016/j.ins.2019.09.070 INS 14901

To appear in:

Information Sciences

Received date: Revised date: Accepted date:

8 June 2019 25 September 2019 26 September 2019

Please cite this article as: Feng Wang, Huiqing Zhu, Wei Li, Kangshun Li, A Hybrid Convolution Network for Serial Number Recognition on Banknotes, Information Sciences (2019), doi: https://doi.org/10.1016/j.ins.2019.09.070

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Inc.

A Hybrid Convolution Network for Serial Number Recognition on Banknotes Feng Wanga , Huiqing Zhua , Wei Lib , Kangshun Lic,∗ a

b

School of Computer Science, Wuhan University, Wuhan, 430072, China School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China c College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China

Abstract As the sole identity of banknote, serial number has played a crucial role in monitoring the circulation of currencies. Serial number recognition plays an important role in financial market, which requires fast and accurate performances in real applications. In this paper, a hybrid convolution network model has been proposed, in which a dilated-based convolution neural network is employed to improve the recognition accuracy and a quantitative neural network method is developed to speed up the identification process. In dilated-based convolution neural network, the convolution layer and the pooling layer have been replaced by dilated convolution, which can reduce the computation cost. The quantitative neural network based method quantizes the weight parameters to an integer power of two, which transforms the original multiplication operation to a shift operation and can greatly reduce the time. The proposed model was examined and tested on four different banknotes with 35,000 banknote images including RMB, HKD, USD and GBP. The experimental results show that, the proposed model can efficiently improve the recognition accuracy to 99.89% and reduce the recognition time to less than 0.1 millisecond, and it outperforms the other algorithms on both recognition accuracy and recognition speed. Keywords: serial number, convolution neural network, image recognition ∗

Corresponding author. Email address: [email protected] (Kangshun Li)

Preprint submitted to Information Science

September 26, 2019

1. Introduction As the sole identity of banknotes, serial number plays a crucial role in industrial applications and financial transactions nowadays. Fast and accurate recognition of the serial number can help reduce financial crime and improve the stability of financial market. In real-world applications, the images of banknotes for recognition are usually captured by CIS (Contact Image Sensor) and scanned by low-performance hardware devices, e.g., cash counter. Therefore, it is very difficult to recognize the characters of the serial number effectively. It’s well-known that fast recognition accuracy can help banks monitor the circulation of the currencies better. In the past few decades, many methods have been proposed to improve the recognition accuracy in the research field of pattern recognition, and many machine learning based techniques have also been used to improve the performance of recognition [24, 7, 15, 14, 22, 23, 21]. Adankon et al. proposed the least squares SVM (LS-SVM) method based on the margin-maximization principle, which performs well for handwriting character recognition [1]. Andreone et al. developed a new system for pedestrian detection based on SVM, which has already been successfully implemented in day-light applications [2]. Feng et al. proposed a linear classifier fusion method based on SVM and LDF learning for automatic recognition for banknote serial numbers [6]. Although these algorithms can get good recognition results, they also have some shortcomings such as poor generalization ability, complex feature extraction process and high computation cost. With the development of artificial intelligence technology, some new techniques have been proposed, and many deep learning based techniques have been developed which are promising to break the bottleneck of traditional algorithms and get better results [26, 4, 11]. The LeNet-5 [13] network proposed by Yann LeCun in 1998 was the first learning algorithm which reduces the number of network parameters and improves the performance of the network model. After that, some other methods were proposed, such as AlexNet [12] and VGGNet [19]. AlexNet uses the nonlinear activation functions ReLu[17] and Dropout[9] to further improve the recognition accuracy of the network. VGGNet constructs a 19-layer deep convolution neural network by studying and analyzing the relationships among different convolution layers. These methods mainly focus on recognition accuracy and have shown good per2

formances on high-resolution images, but ignore the recognition speed. The recognition speed is also important for serial number recognition, and it is very necessary to shorten the recognition time on serial numbers of banknotes. Recently, many methods have been proposed to reduce the model size which have been proved to shorten recognition time effectively. The most common one is to compress the network by pruning a trained dense network[8]. These methods are suitable for the network that contains a large number of redundant parameters. Another effective approach is to use a compact network model with fewer parameters. GoogleNet [20] and ResNet [25] compact the network models by replacing fully-connected layers with simpler global average pooling. SqueezeNet[10] reduces the number of parameters by replacing 3 × 3 filters with 1 × 1 filter with deeper network structure. There is also an effective method to compress network by quantizing full-precision weight to a small number of bits [5][27]. XNOR-Net [18] which only uses one bit for weights, converts the original convolution into an XOR operation. Ternary neural network [16] quantizes each weights to {−1, 0, 1}. However, these methods rarely consider the accuracy loss in the quantization process. Due to the time cost on floating-point parameters, it is difficult to reduce the recognition time without losing the recognition accuracy. Since the character image of the serial number on banknotes has fewer pixels, a little noise would influence the results of recognition. As mentioned above, the recognition work is required to be done in a very short time with high accuracy. How to improve the recognition accuracy as well as speed up the recognition process of the serial number on banknotes has been a challengeable research issue. In this paper, in order to enhance the performance of the banknote serial number recognition, we introduce a hybrid convolution network model for serial number recognition. In this hybrid convolution network model, we develop two networks to deal with recognition accuracy and recognition speed respectively. For recognition accuracy, we explore a dilated based convolution network which has larger receptive field than the ordinary convolution network and can get more accurate results in a limited time. For recognition speed, we propose a quantitative neural network in which the parameters are quantized from floats to integers and further shorten the time of recognition. The main contributions of this paper can be summarized as follows. • A new dilated convolution based network (Dilat-Net) with smaller 3

structure is developed, which only includes a dilated convolution layer and a full connection layer. In Dilat-Net, instead of using convolution layer and pooling layer, the image features are extracted by dilated convolution which can get fast and efficient recognition via expanding the receptive field of the convolution kernel without losing pixel information of serial numbers. • A new quantization method is proposed which compress the network model by quantizing each parameter in the training model to an integer power of two. Different from other compression methods, this quantization method can reduce the running time significantly. • In order to further improve quantization efficiency, an adaptive adjustment scheme for quantitative network training is proposed, which can further shorten the recognition time and make the network more feasible to deploy on hardware with limited memory. The rest of this paper is organized as follows. Section 2 introduces the procedure of banknote image data preprocessing. Section 3 describes the dilated convolution based network. Section 4 shows the quantizing process of the quantized network and the experimental results and discussion are presented in Section 5. 2. Image preprocessing Before recognizing serial number, we need to obtain serial number characters on the scanned image of banknotes. The original gray images of banknotes which scanned from CIS (Contact Image Sensor) including various kinds of complex noise such as twisted characters and artificial creases. It is very difficult to get serial numbers characters directly. And the image preprocessing is required to be done before recognition. There are two main tasks for image preprocessing. The first one is serial number detection which detects the serial number region by skew correction and orientation identification. The second one is character segmentation. In this section, we use RMB as the example to introduce the details of banknote image preprocessing. 2.1. Serial number region detection 2.1.1. Skew correction During the currency scanning process, since the banknotes usually cannot enter the scanner smoothly, the scanned bitmap image sometimes is skewed, 4

as shown in Figure 1, which will affect the accuracy and efficiency of recognition. Therefore, we need to do skew correction on the scanned banknote image at first. As Figure 1 shows, the banknotes are surrounded with black background. The gray value of the black background is 0 which is very different from that of the banknotes, so the boundary points of banknotes can be determined by the gradient change of the gray value. Thus, we can estimate the edges of banknotes by Hough transformation. The skewed banknote image then rotates with the inclination angle of the horizontal edge. And the invalid black background area around the banknotes can be removed according to the edge information of the banknotes.

Figure 1: Skew image with black back- Figure 2: Horizontal image without black ground background

2.1.2. Orientation identification From Figure 2 we can see that, the serial number is located in the bottomleft corner of banknotes with some fixed size. According to this prior information, we can get the region of the serial numbers. However, the orientation of the image is uncertain during the sampling process. As Figure 3 shows, there are four possible orientations of the image which are named as front-forward direction, front-reverse direction, back-forward direction, and back-reverse direction. Before getting the rough region of the serial numbers, we need to get the orientation information of the image at first. For different orientations of different banknotes, the serial number characters always appears on the left or right half part of the image. Here we use the RMB ’100’ pattern template as an example, which is shown in Figure 4(a). According to the matching result of four directions mentioned above, we get the right direction of the image. The matching results are obtained by calculating the similarities between the template and detected images. According to the orientation information of the image, the rough region of serial numbers can be obtained by the serial number’s location and size. Figure 4(b) shows the rough region result of the serial numbers. 5

Figure 3: The four orientations of RMB image (a)Front-forward direction (b)Front-reverse direction (c)Back-forward direction (d)Back-reverse direction

Figure 4: (a)The ’100’ pattern template (b)The rough region of serial numbers

2.2. Character segmentation After obtaining the region of the serial number, we need to extract every single character as the input of recognition by character segmentation. Since the area of the serial number contains complex background noise, this kind of noise will dramatically reduce the accuracy of character recognition. Therefore, extracting single characters from the serial number area with complex background noise is very necessary to improve the input images’ quality for recognition. 2.2.1. Image binarization In order to improve the accuracy of character segmentation, we need to remove irrelevant information at first like background and noise. Image binarization is the most effective method to separate the area of characters from the background according to the threshold. Because banknotes are affected by abrasion and crumple in the circulation, the existing algorithms of binarization are often unable to obtain good results. In order to further improve the effect of binarization, we here use the binarization method (BwB) which we proposed in our previous work [28]. In BwB, we firstly propose a new global binarization algorithm named 6

BACSGH which is only associated with the overall gray distribution of pixels. Since some trivial information of the character strokes could be ignored by BACSGH, this BACSGH algorithm may be inefficient especially in complex background. Then we further propose a local binarization algorithm named BANSEGR, which can label the pixels by their neighborhood information. In order to get more accurate results for segmentation, we not only use the global gray distribution information of images, but also explore each pixel’s neighborhood information in the binarization process. We combine the above two binarization algorithms and named this new binarization method as BwB by integrating both the local information and global information simultaneously. 0, BAN SEGR(x, y) = 0&&BACSGH(x, y) = 0 BwB(x, y) = 1, otherwise

(1)

In BwB, the pixel is labeled as a character stroke by setting the gray value as 0 when the two binarization algorithms simultaneously identify the same pixel as a character stroke, otherwise, it is labeled as background information by setting its gray value as 1. The BwB method can effectively reduce the possibilities of misjudgment of character strokes even in the background with complex noise. 2.2.2. Character extraction Because of the complex background of banknotes, it is difficult to remove the background noise completely by binarization. In order to improve the accuracy of the character recognition, the character boundaries need to be set as close to the character itself as possible. Character bounding rectangle can be located in precise serial number area based on the prior information, such as the number of characters, maximum character width, and character spacing. Here, we use the adaptive character extraction algorithm (ACE) [28] to adjust the horizontal and vertical boundaries to get more accurate character segmentation results. For a precision binary image, we estimate the left and right boundaries of each character according to the character width, and the ACE algorithm can adjust the boundaries of each character by the local contrast average value information and the distance between its gravity and center. After the adjustment on the horizontal boundary and vertical boundary, we can extract characters efficiently.

7

3. A hybrid convolution network model for serial number recognition In this section, we will introduce a hybrid convolution network model for serial number recognition. For the serial number recognition, a dilated-based convolution neural network is designed to improve the recognition accuracy, and a quantitative neural network based method is used to accelerate the learning speed of the model. 3.1. Dilated-based convolution neural network (Dilat-Net) Compared with other convolution neural networks, the dilated-based convolution neural network has several advantages for serial number recognition. Firstly, dilated-based convolution neural network replaces the original convolution layer with a dilate convolution layer, which can greatly reduce the computation cost. As shown in Figure 5, the dilated convolution network is different from the classical convolution network. The weight in the normal convolution kernel corresponds to the area of size 1 × 1 in the original image, while it corresponds to the area of size 2 × 2 in the dilated convolution network. It means the dilated convolution has larger receptive field than the ordinary convolution, even when they have the same number of weight parameters. In other words, if the dilated convolution network and ordinary convolution network have same receptive field, as shown in Figure 6, the dilated convolution has fewer network weight parameters, which can guarantee less computation cost and faster learning speed in the recognition process.

Figure 5: (a)Ordinary convolution (b)Dilated convolution

Secondly, since the information of each pixel in the low-resolution image has played a crucial role in the recognition process, the dilated-based convolution neural network removes the pooling layer to guarantee the image information would not be lost during the recognition process, which is helpful to improve recognition accuracy. Figure 7(b) reveals the maximum pool 8

Figure 6: (a)Ordinary convolution (b)Dilated convolution

Figure 7: (a)Receptive field (b)Maximum pool

whose receptive field is 2 × 2, and the features are obtained by selecting the largest value in the 2 × 2 area of image. The pooling layer loses image information while expanding the receptive field. Different from the pooling layer, the dilated convolution can expand the receptive field without losing image information. Figure 7(a) shows the dilated convolution whose receptive field is 3 × 3, and no loss of image information relative to the pooling layer. It not only reduces the number of layers in the network, but also improves the recognition accuracy by utilizing the dilated convolution network. Different from other convolution neural networks, the dilated-based convolution neural network has a very simple network structure which would cut down the computation cost of network training. Figure 8 shows the network structure of the dilated-based convolution neural network (Dilat-Net) designed for banknote serial number recognition, and there are five layers in the Dilat-Net network. The first layer of the network is the input layer, and the image size of the input layer is 16 × 16. The second layer is a dilated convolution layer in which the size of the convolution kernels is 2 × 2. In order to synthesize the feature information obtained by the dilated convolution layer and identify the serial number more accurately, we use the fully connected layer in the third layer of the network. In the dilated-based convolution neural network, we adopt Relu as the activation function which costs less computation time than Sigmoid. The specific process of Dilat-Net is shown in the Algorithm 1. 9

Figure 8: Dilated convolution based network

Algorithm 1 Dilated-based convolution neural network Input: Single character image: image, Dilated convolution kernel weight: Wl :1 ≤ l ≤ n, Fully connected layer weight: Wf :1 ≤ f ≤ 36, Output: Label of single character image. Initialize conv resultsl : 1 ≤ l ≤ n, f ull resultsl : 1 ≤ l ≤ 36 for l=1,2,3,· · · , n do conv resultsl ← image × Wl end for for l=1,2,3,· · · , n do Rule resulte = Relu(conv resultsl ) end for for f=1,2,3,· · · , 36 do f ull resultsf ← Rule resulte × Wf end for; for i=1,2,...36 do Calculate the f inal output through the softmax function; Select the maximum value in f inal output; Set the index of the maximum value as the label of single character image; 15: end for 16: Output the label of single character image.

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

10

3.2. Quantitative neural network As we mentioned above, the dilated-based convolution network is helpful to get better recognition performances for serial number recognition. However, in the dilated-based convolution network, there are lots of convolution operations which are very time-consuming, and the weight parameters of network are 32-bit floating-point numbers which also occupy a lot of memory space and cost a large amount of time on calculation. To further improve the recognition speed and compress network model, we propose a quantitative neural network by which those parameters can be quantized to integers power of two. To guarantee the recognition accuracy, an adaptive quantization training scheme is proposed which can improve quantization efficiency. Compared with other methods, the proposed quantization method can not only compress the model and optimize the convolution calculation, but also improve the recognition accuracy. 3.2.1. Quantitative approach Multiplication is the main operation in the convolution process, and it costs a large amount of calculation time. The convolution operation not only requires longer prediction time but also higher performance on hardware devices. In order to optimize the process of convolution calculation, each weight parameter in the full-precision network is quantized to an integer power of two. In this way, the original convolution operation that based on multiple operations can be converted into a shift operation. In order to minimize the differences between the quantized weight parameters and the original weight parameters and reduce the accuracy loss after network quantization as much as possible, the quantization parameter set are determined according to the original network weight parameters. Assume that the weight parameters are quantized from 32-bits floats to b-bits parameters, the quantization interval Pl can be determined by Formula (2). Pl = {±2n1 , · · · , ±2n2 , 0}

(2)

where n1 and n2 are integers, and n2 is determined by the maximum absolute value of all weights in the current layer. The process of getting n2 can be represented by Formula(3), where max(|Wl |) represents the weight with the maximum absolute value in the Lth layer of network. 2n2 −1 ≤ max(|Wl |) ≤ 2n2

(3) 11

If n1 ≤ n2 , according to the integers n2 and the quantization bit b, n1 can be calculated by Formula (4). n1 = n2 + 1 −

2b−1 2

(4)

As the floating-point parameter is not suitable for the shift operation, before the quantization of weight parameters, a quantization coefficient k is set to avoid the generation of floating-point numbers for network parameters in the quantization process. The value of k is determined by the weight with minimum absolute value in the network, and k can be obtained by Formula(5). 0, if min(|W |) ≥ 1 m k = 2 ,m = (5) blog2 min(|W |)c, otherwise After determining the values of those parameters, we can set the quantization weights as follows. ≤ |Wl (i, j)| < 3β kβsgn(Wl (i, j)), if (α+β) 2 2 Wl (i, j) = (6) 0, otherwise where α and β are two adjacent elements in the set Pl . It should be pointed out that here we select the numerical rounding value for the weight parameters in quantization process. Algorithm 2 states the process of the quantization approach. 3.2.2. Adaptive adjustment scheme If the quantization approach is used directly to quantize all the weight parameters in a network, it will involve recognition accuracy loss. In order to guarantee the quantized network has no accuracy loss relative to the fullprecision network, an adaptive adjustment scheme is proposed in the network quantization training process. The adaptive adjustment scheme determines the quantization ratio of the parameters in each quantization process. The recognition accuracy can be improved by updating and training parameters in the back-propagation process of the network. As a result of this, some parameters are trained firstly without quantization to get higher accuracy before each quantization iteration. If retraining makes the accuracy higher than or very close to that before quantization, the training process of the quantitative network will stop and the next quantization process will start. 12

Algorithm 2 Quantitative neural network Input: Pre-trained full-precision CNN model: Net{Wl : 1 ≤ l ≤ L}, Bit-width of quantization weight: b, Number of weights in the lth layer: numl , Weight of the largest absolute value in the lth layer of network: max(|Wl |), Length of Pl : lenp . Output: Final quantitative model with the weights constrained to be either powers of two or zero: BitNet. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17:

Calculate k according to Formula (5); for i=1,2,3,· · · ,L do Calculate n2 according to Formula (3); Calculate n1 according to Formula (4) and max(|Wi |); Determine the quantization range Pl of weights according to the Formula (2); for i = 1, 2, · · · , L do for j = 1, 2, · · · , numi do for m = 1, 2, · · · , lenp do α = Pl [m], β = Pl [m + 1] ≤ |Wl (i, j)| < 3β do if (α+β) 2 2 Quantize the weights Wi [j] by Formula (6); else do Wi [j] = 0; end if end for end for Output the result.

The process of quantization is repeated until all parameters in the quantitative network have been quantized well. Since the weight parameter with bigger absolute value in network has greater impact on the recognition accuracy, in the quantization process, the parameters with large absolute values would be quantized at first. We sort the parameter values in descending way and select the top q% parameters to quantize, herein q% means the quantization ratio that mentioned 13

above. Then, the remaining parameters are trained and updated by backpropagation. In order to ensure the parameters that have been quantized would not be updated, we use a label matrix T to distinguish parameters quantized or not quantized in the back-propagation process. In the label matrix T , each parameter corresponds to a weight value of the corresponding layer. When the parameter in the label matrix T is 0, it means that the parameter in the corresponding weight matrix is not necessary to be updated. The weight parameter can be updated by the following formula. Wl ← Wl − η

∂E · Tl ∂Wl

(7)

where E is the training loss of the network, Wl is the weight in the lth layer of network, Tl is the label matrix of lth layer of network and η is the learning rate. The quantization ratio is important to the recognition performance for the quantitative network. Here, we firstly select the quantization ratio randomly in the interval (r, r + 25%], where r is the quantization ratio of the previous quantization network. Then we propose an adaptive adjustment scheme to determine the optimal quantization scale. The quantization ratio is updated with an incremental step as 0.05 each time, and QuantCo in (8) is used to measure the quantization coefficient of the current quantization scale. After that, the optimal quantization scale can be determined by QuantCo. In (8), ∆r represents the difference between the current quantization ratio and previous quantization ratio, and ∆V alAcc indicates the accuracy differences among the networks that quantized with different ratios. It should be pointed out that, the next quantization scale refers to the quantization ratio that has been determined in the previous quantization step. For example, if the previous quantization ratio is r1 , it means the network has already been quantized well without any accuracy loss during the retraining process. QuantCo =

∆V alAcc ∆r

(8)

If the quantization ratio is 1, the accuracy loss cannot be avoided by retraining. Here, we set a threshold value θ in the quantization ratio updating process. In the network quantization process, if the quantization ratio is greater than or equal to θ, the incremental step of the quantization ratio will become smaller. The quantization ratio can be adaptively adjusted by 14

Algorithm 3 Adaptive adjustment scheme in quantitative network training Input: Pre-trained full-precision CNN model: Net {Wl : 1 ≤ l ≤ L}, Ratio of quantification: r%, Bit-width of quantization weight: b, Number of weights in the lth layer: numl , Output: Final quantitative model with the weights constrained to be either powers of two or zero: BitNet. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17:

Initialize the Tl ← 1, f or1 ≤ l ≤ L, for i=1,2,3,· · · ,L do Sort the weight Wi from largest to smallest, get new weight sequence W sorti ; end for for i=1,2,3,· · · ,L do for j=1,2,3,· · · ,q × numl do Quantizes the weight of Wi [j] according to Algorithm 2; end for end for Calculate QuantCo according to Formula (8) when the quantization ratio is r; Repeat the above steps until QuantCo is maximized, get the most appropriate quantization ratio r1 ; if r1 > θ do Determine the quantization ratio by Formula (9); end if Retrain the rest of the (1 − r1 ) weights, until the accuracy is close to that of the pre-trained full-precision; Repeat the above steps until all the weights of network have been quantized; Output the result.

Formula (9). r + ∆r, when r ≤ θ and max(QuantCo) r1 = r + ∆r, when r > θ and min(QuantCo) 15

(9)

where r1 indicates the current quantization ratio of the quantitative network by the adaptive adjustment scheme. Algorithm 3 states the process of the adaptive adjustment scheme for quantitative network training. 4. Experimental results and discussion In order to verify the effectiveness of the hybrid convolution network, we use some evaluation metrics to compare the results with some traditional methods in practical applications. All experiments are conducted on a PC with an Intel Core i7-4790 running at 3.6GHz with 8GB of RAM. The operating system is Windows 7 and the compiler is Pycharm 2017. The framework of deep learning is Tensorflow. 4.1. Data set Here, we use the data set with 35, 000 character images including RMB, HKD, USD and GBP, which are obtained after the serial number segmentation, and the normalized size of the image is 16 × 16. Figure 9 shows the character categories in the serial number recognition dataset. In the neural network training process, we use the mini-batch gradient descent method [3]. And the batch-size of small batch gradient descent is set as 500.

Figure 9: Charater images

4.2. Evaluation metrics Performances of models are evaluated by two widely used metrics, including Accuracy(10) and Speed(11). Accuracy =

TP + TN TP + FP + TN + FN

(10)

where T P and T N refer to the number of true positives and true negatives, F P and F N represent the number of false positives and false negatives. 16

T ime (11) N where T ime is the time of recognition total images which is measured by CPU time, and N is the total number of image set. It should be pointed out that here Speed is an average value which can be calculated in a statistical way. Speed =

4.3. Comparison networks In order to testify the performance of the proposed hybrid network, we select some convolution neural networks with different structures which have already been widely used in other recognition tasks for comparisons. For the size of convolution kernel, we here select the networks whose convolution kernel size is 3 × 3 or 5 × 5. And the size of the pooling is 2 × 2. As the serial number recognition is required to be done in very short time, we choose simple CNNs with one or two convolution layers and pooling layers, and the number of convolution kernel is set no more than 24. 4.4. Experimental results The experiments are conducted in the following three steps. Firstly, in order to verify the performance of the proposed dilated-based convolution neural network, we compare the recognition results with some convolution neural networks. Secondly, we testify whether the quantitative network can speed up the recognition time or not by comparing the results of different models with quantitative network. Finally, to further verify the improvement of the recognition accuracy, we compare the models with quantitative network using the adaptive adjustment scheme. Many variants of recognition models based on convolution neural networks (CNNs) have shown good performances on different datasets. As a result of this, we compared our results with other recognition models based on CNNs, as shown in Table 1. In Table 1, the numbers before conv and pool represent the number of convolution layers and pooling layers. And in the first row of the table, Conv means the size of the convolution kernel used in this network model, and P ool means the size of the pooling layer. And the numbers after Conv and P ool represent the corresponding layer of the network. For example, Conv1 is the convolution kernel size of the first convolution layer in the network, and P ool1 means the size of the first pooling layer in the network. 17

Table 1: Recognition results of different convolution networks

Name 1conv 1conv 1conv+1pool 1conv+1pool 2conv 2conv+1pool 2conv+2pool 2conv+2pool Dilat-Net

Network Conv1 Pool1 Conv2 3*3*16 5*5*16 3*3*20 2*2 5*5*20 2*2 5*5*24 5*5*16 5*5*24 2*2 5*5*16 5*5*24 2*2 3*3*16 5*5*16 2*2 3*3*16 2*2*16 -

Pool2 2*2 2*2 -

Accuracy

Speed

99.66% 0.239ms 99.62% 0.272ms 99.58% 0.261ms 99.60% 0.280ms 99.72% 0.931ms 99.71% 0.354ms 99.67% 0.385ms 99.57% 0.300ms 99.9% 0.204ms

From Table 1, we can see that, since CNNs have good generalization ability and can extract the image features better; all network models can achieve recognition accuracy more than 99% on our test samples. But only our Dilat-Net can achieve the recognition accuracy as 99.9%. Most specifically, Dilat-Net has the fast recognition speed as 0.204 ms. And we also find that the dilated-based convolution neural network and the networks with two convolution layers have better recognition accuracy than the networks with one convolution layer. The reason is that the network structure with one convolution layer is relatively simple so that the character features are not well extracted during the network training process. Since some information (i.e, character strokes) might be lost during the pooling process, the pooling layer could affect the recognition accuracy. Compared with other network models, the proposed Dilat-Net has much improvement on both recognition accuracy and recognition speed. 4.5. Quantitative network results In Table 2, we use +Q to represent that the network structure uses the quantization method. From the results in Table 2 and Table 1, we can see that, in the quantitative networks, due to the quantization method which can transform the original multiplication operation into shift operation, the recognition time has been greatly cut down. But since there is no retraining in the quantization process, the recognition accuracy has declined obviously for most networks, except for 2conv + 2pool + Q and Dilat − N et + Q. Although Dilat-Net still outperforms the others on both recognition accuracy 18

and recognition speed, the recognition accuracy has decreased from 99.9% to 95.06%. In order to overcome the drawback, we further propose an adaptive adjustment scheme for quantitative network training. Table 2: Recognition results of different quantitative networks without adaptive adjustment scheme

Name 1conv+Q 1conv+Q 1conv+1pool+Q 1conv+1pool+Q 2conv+Q 2conv+1pool+Q 2conv+2pool+Q 2conv+2pool+Q Dilat-Net+Q

Network Conv1 Pool1 Conv2 3*3*16 5*5*16 3*3*20 2*2 5*5*20 2*2 5*5*24 5*5*16 5*5*24 2*2 5*5*16 5*5*24 2*2 3*3*16 5*5*16 2*2 3*3*16 2*2*16 -

Pool2 2*2 2*2 -

Accuracy(%)

Speed

60.13% 68.73% 68.63% 64.83% 56.18% 52.99% 94.64% 94.85% 95.06%

0.140ms 0.167ms 0.152ms 0.192ms 0.531ms 0.304ms 0.325ms 0.230ms 0.102ms

Table 3: Recognition results of quantitative networks with adaptive adjustment scheme

Network Model Name Conv1 Pool1 Conv2 Pool2 2conv+1pool+Q 5*5*24 2*2 5*5*16 2conv+1pool+Q+A 5*5*24 2*2 5*5*16 2conv+2pool+Q 5*5*16 2*2 3*3*16 2*2 2conv+2pool+Q+A 5*5*16 2*2 3*3*16 2*2 Dilat-Net+Q 2*2*16 Dilat-Net+Q+A 2*2*16 -

Accuracy

Speed

52.99% 99.69% 94.85% 99.66% 95.26% 99.89%

0.304ms 0.299ms 0.230ms 0.228ms 0.102ms 0.098ms

In Table 3, we use +A to represent that the network structure uses the adaptive quantization method, and proportion represents the ratio of quantization. As shown in Table 3, compared with the quantization convolution networks (Q), it is obvious that the quantization networks with the adaptive adjustment scheme (Q+A) have better recognition accuracy. A possible reason is that the adaptive quantization training scheme considers the accuracy loss in each quantization, which can help get an optimal quantization ratio so as to achieve more accurate recognition results. We also find that, for all 19

Q + A networks, the time cost has slightly decreased. Dilat − N et + Q + A shows good recognition accuracy as 99.89% which is very close to 99.9%, and the recognition time for each character is less than 0.1 millisecond. That means, for Dilat − N et, the quantization network with adaptive adjustment scheme can achieve good performance. From the above results, we can see that our proposed dilated-based convolution neural network and quantitative neural network indeed help improve the recognition accuracy as well as recognition speed. 5. Conclusions In this paper, in order to get an effective recognition result of the serial number on banknotes with low pixels and various kinds of noise, a hybrid network model with a dilated-based convolution neural network and a quantitative neural network has been developed. In the dilated-based convolution neural network, the convolution layer is replaced by a dilate convolution layer and the pooling layer is removed, which not only improves the accuracy but also shortens the recognition time. In the quantitative neural network, to further compress the network model and improve the recognition speed, the weight parameters of the network are quantized by integers power of two. The multiplication operation with high computation cost is converted to the shift operation with lower computation cost. Then, in order to improve the efficiency of network quantization and reduce the loss of accuracy in the network quantization process, an adaptive adjustment scheme has been employed to determine the ratio of each quantization. The experimental results on 35, 000 banknote images show that the proposed hybrid model outperforms the other methods. In the future, we will focus on the study of network compressing method which can further optimize the floating-point operation and reduce recognition time. And we will also try to develop the banknote recognition technologies on mobile devices, which can make the banknote recognition more convenient in real applications. Acknowledgements This work is supported by National Nature Science Foundation of China (Grant No. 61773296); the Fundamental Research Funds for the Central Universities (Grant No. 2042018kf0224); and Research Fund for Academic Team of Young Scholars at Wuhan University (Grant No. Whu2016013). 20

Declaration of Interest Statement We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted. References [1] Adankon, M. M., Cheriet, M., 2009. Model selection for the ls-svm. application to handwriting recognition. Pattern Recognition 42 (12), 3264–3270. [2] Andreone, L., Bellotti, F., Gloria, A. D., Lauletta, R., 2005. Svm-based pedestrian recognition on near-infrared images. In: International Symposium on Image and Signal Processing and Analysis. pp. 274–278. [3] Bottou, L., Curtis, F. E., Nocedal, J., 2018. Optimization methods for large-scale machine learning. SIAM Review 60 (2), 223–311. [4] Cai, Y., Liu, X., Zhang, Y., Cai, Z., 2018. Hierarchical ensemble of extreme learning machine. Pattern Recognition Letters 116, 101 – 106. [5] Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y., 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. [6] Feng, B. Y., Ren, M., Zhang, X. Y., Suen, C. Y., 2014. Automatic recognition of serial numbers in bank notes. Pattern Recognition 47 (8), 2621–2634. [7] Gong, W., Cai, Z., 2013. Parameter extraction of solar cell models using repaired adaptive differential evolution. Solar Energy 94, 209 – 220. [8] Han, S., Mao, H., Dally, W. J., 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. Fiber 56 (4), 3–7. [9] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2012. Improving neural networks by preventing co-adaptation of feature detectors.

21

[10] Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., Keutzer, K., 2016. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5mb model size. [11] Jiang, L., Zhang, L., Li, C., Wu, J., 2019. A correlation-based feature weighting filter for naive bayes. IEEE Transactions on Knowledge and Data Engineering 31 (2), 201–213. [12] Krizhevsky, A., Sutskever, I., Hinton, G. E., 2012. Imagenet classification with deep convolutional neural networks. In: Proceedings of 26th Annual Conference on Neural Information Processing Systems 2012. pp. 1106–1114. [13] Lecun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to document recognition. In: Proceedings of the IEEE. Vol. 86. pp. 2278–2324. [14] Li, K., Liang, Z., Yang, S., Chen, Z., Wang, H., Lin, Z., 2019. Performance analyses of differential evolution algorithm based on dynamic fitness landscape. International Journal of Cognitive Informatics and Natural Intelligence 13 (1), 36–61. [15] Li, K., Wang, H., Li, S., 2019. A mobile node localization algorithm based on an overlapping self-adjustment mechanism. Information Sciences 481, 635–649. [16] Mellempudi, N., Kundu, A., Mudigere, D., Das, D., Kaul, B., Dubey, P., 2017. Ternary neural networks with fine-grained quantization. [17] Nair, V., Hinton, G. E., 2010. Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, ICML 2010. pp. 807–814. [18] Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A., 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In: European Conference on Computer Vision, ECCV2016. pp. 525–542. [19] Simonyan, K., Zisserman, A., 2015. Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, ICLR 2015. 22

[20] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., 2014. Going deeper with convolutions, 1–9. [21] Wang, F., Li, Y., Zhang, H., Hu, T., Shen, X., 2019. An adaptive weight vector guided evolutionary algorithm for preference-based multiobjective optimization. Swarm & Evolutionary Computation 49, 220 – 233. [22] Wang, F., Zhang, H., Li, K., Lin, Z., Yang, J., Shen, X.-L., 2018. A hybrid particle swarm optimization algorithm using adaptive learning strategy. Information Sciences 436-437, 162–177. [23] Wang, F., Zhang, Y., Rao, Q., Li, K., Zhang, H., 2017. Exploring mutual information-based sentimental analysis with kernel-based extreme learning machine for stock prediction. Soft Computing 21 (12), 3193– 3205. [24] Wu, B., Qian, C., Ni, W., Fan, S., 2012. Hybrid harmony search and artificial bee colony algorithm for global optimization problems. Computers & Mathematics with Applications 64 (8), 2621–2634. [25] Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K., 2016. Aggregated residual transformations for deep neural networks, 5987–5995. [26] Zhang, Y., Wu, J., Zhou, C., Cai, Z., 2017. Instance cloned extreme learning machine. Pattern Recognition 68, 52 – 65. [27] Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y., 2016. Incremental network quantization: Towards lossless cnns with low-precision weights. In: International Conference on Learning Representations, ICLR 2016. pp. 1–13. [28] Zhou, J., Wang, F., Xu, J., Yan, Y., Zhu, H., 2019. A novel character segmentation method for serial number on banknotes with complex background. Journal of Ambient Intelligence and Humanized Computing 10 (8), 2955–2969.

23

A hybrid convolution network for serial number recognition on banknotes

A hybrid convolution network for serial number recognition on banknotes

Recommend Documents