Available online at www.sciencedirect.com Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2019) 000–000 Procedia Computer Science (2019) 000–000 Procedia Computer Science 14700 (2019) 198–202
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
2018 International Conference on Identification, Information and Knowledge 2018 International Conference on Identification, Information and Knowledge in the Internet of Things, IIKI 2018 in the Internet of Things, IIKI 2018
Modeling and recognition of human limbs cooperative interaction Modeling and recognition of human limbs cooperative interaction based on Random Increased Hybrid Learning Machine based on Random Increased Hybrid Learning Machine Xiaoying Zhangaa , Xiaojuan Banaa , Zheng Changaa , Ting Liuaa Xiaoying Zhang , Xiaojuan Ban , Zheng Chang , Ting Liu
a University a University
of Science&Technology Beijing, No.30, Xueyuan Road, Haidian District, Beijing, 100083, China of Science&Technology Beijing, No.30, Xueyuan Road, Haidian District, Beijing, 100083, China
Abstract Abstract In this paper, we study the identification problem of human limbs synergistic interaction based on the application of natural humanIn this paper, we studyWe theproposed identification problem of human limbs synergistic interactioninteraction based on the application of natural humancomputer interaction. an identification model of human limbs cooperative based on the Random Increased computer interaction. We proposed an identification model of human limbs cooperative interaction based on the Random Increased Hybrid Learning Machine for the unique requirements of natural human-computer interaction applications. The model combines Hybrid Learning forHu theinvariant unique requirements natural human-computer interaction applications. modelrationality combines the motion regionMachine image, the moment modelofand the RIHLM. In this paper, the algorithm theory, The the model the region image, of thethe Huhuman invariant moment model and the RIHLM. In the this RIHLM paper, theisalgorithm theby model rationality andmotion the implementation motion recognition model based on described.theory, Finally, comparing the and the implementation the human motion recognition thebetter RIHLM is described. Finally, comparing experimental results, weofverified that the model presentedmodel in thisbased paperonhas robustness, timeliness andbyaccuracy in the the experimental we verified that the model presented in this paper has better robustness, timeliness and accuracy in the application of results, natural human-computer interaction. application of natural human-computer interaction. c 2019 2019 The The Authors. Authors. Published Published by by Elsevier Elsevier B.V. B.V. © c 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND This is an open access article under the CC BY-NC-ND license license (https://creativecommons.org/licenses/by-nc-nd/4.0/) (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under under responsibility responsibilityofofthe thescientific scientificcommittee committee the 2018 International Conference on Identification, Information Peer-review ofof the 2018 International Conference on Identification, Information and Peer-review responsibility the scientific committee of the 2018 International Conference on Identification, Information and Knowledge the Internet of of Things. Knowledge inunder theinInternet of Things. and Knowledge in the Internet of Things. Keywords: natural human-computer interaction; motion modeling and recognition; motion invariant moment; invariant feature; RIHLM classifier Keywords: natural human-computer interaction; motion modeling and recognition; motion invariant moment; invariant feature; RIHLM classifier
1. Introduction 1. Introduction Human-Computer Interaction, or Human-Machine Interaction (HCI or HMI) is a subject studying the ways and Human-Computer Interaction, or Human-Machine Interaction (HCI or is of a subject studying is thetransmitted ways and means of human-machine interaction. People in the interactive process, upHMI) to 93% the information means of human-machine interaction. People in the interactive process, up to 93% of the information is transmitted through the nonlinguistic ways, such as speech rate, intonation, facial expressions, gestures and so on, especially the through the nonlinguistic ways, such as speech rate, intonation, nature facial expressions, gestures and so on, especially the facial expressions and motion messages. Since the nonlinguistic synergistic interaction means between humans facial expressions and motion messages. Since the nonlinguistic nature synergistic interaction means between humans contain rich meanings and information, it is necessary to study the ways in which these nonlinguistic interactions differ contain meanings it is necessaryand to study in which nonlinguistic differ betweenrich human beingsand dueinformation, to cultural conventions so on.the Atways the same time,these effective modelinginteractions of nonlinguistic between human beings due to cultural conventions and so on. At the same time, effective modeling of nonlinguistic nature synergistic interaction means for humans becomes more important. With the increasing demand of social nature synergistic interaction means for humans becomes more important. With the increasing demand of social ∗ ∗
Xiaojuan Ban. Tel.: +86-010-62334980 ; fax: +86-010-62332281. Xiaojuan Ban. Tel.: +86-010-62334980 ; fax: +86-010-62332281. E-mail address:
[email protected] E-mail address:
[email protected]
c 2019 The Authors. Published by Elsevier B.V. 1877-0509 1877-0509 © 2019 The The Authors. Published by B.V. c 2019 1877-0509 Authors. Published by Elsevier Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the scientific CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the committee of the 2018 Conference on Identification, Information and Knowledge in Peer-review under responsibility of the scientific committee of theInternational 2018 International Conference on Identification, Information and Peer-review under responsibility of the scientific committee of the 2018 International Conference on Identification, Information and Knowledge in the Internet of Things. Knowledge in the Internet of Things. the Internet of Things. 10.1016/j.procs.2019.01.222
2
Xiaoying Zhang et al. / Procedia Computer Science 147 (2019) 198–202 Xiaoying Zhang et al. / Procedia Computer Science 00 (2019) 000–000
199
application for human-computer interaction, the high efficiency of motion recognition system plays an increasingly important role for human-computer interaction system. This natural human-computer interaction technology has a wide range of applications on the reality, such as disability assistance, virtual reality[1], regional monitoring[2], intelligent transportation, sports analysis[3] and so on. The study of natural human-computer interaction technology first involves establishing a reliable human motion model. Secondly, the study effectively recognizes the understanding of human motion and gives the meaning of the classified motion. This paper proposes the Random Increased Hybrid Learning Machine (RIHLM) classifier to learn the invariant features of human motion and effectively identify the human motion with noise. 2. Question At present, the methods used in the modeling of human motion mainly include the following two types: one is to extract the human motion features of video stream and establish the human motion model, the other is to use the three-dimensional data of high-level human structure to establish the human motion model. The human motion modeling method based on video stream is closer to the real situation of human motion. This modeling method is simple to extract features. But the motion representation and modeling method is lack of the three-dimensional information of the human body. The human motion modeling method based on high-level human structure extracts the three-dimensional information features of human motion. Because of the extraction of 3D human body information, the method has better robustness and higher accuracy[4]. However, the method involves large amount of computation. The difficulties and shortcomings of the above two methods include illumination, occlusion, threshold and delay. For these problems, this paper aims to put forward a set of modeling and recognition model of the human body limbs collaborative interaction. The proposed human motion recognition algorithm includes Hu invariant moment model and RIHLM motion classifier. 3. Methods In this paper, the recognition process of human limbs cooperative interaction is divided into three parts. First of all, video segmentation of human motion in video frames is performed to obtain effective human motion features. Then, the second part is to construct the invariant features required by the human motion recognizer by extending the traditional Hu invariant moment method for two-dimensional features in three dimensions. Finally, the third part contains the learning and recognition of the invariant features of the constructed human motion. Based on the requirement of expansibility and real-time in natural human-computer interaction, RIHLM classifier is proposed as a recognizer for human motion. 3.1. Collaborative interactive motion modeling Human motion segmentation The human motion modeling method adopted in this paper is to introduce the spatial and temporal information of human motion in the three-dimensional visual environment into the traditional method[5]. The motion region image modeling method first compares the change between corresponding pixels of adjacent image frames in the image sequence, and then extracts the region image of the human motion using the set threshold. The video segmentation strategy is to perform continuous human motion segmentation on the video stream within a certain period of time. Then the human motion image of the period is used to perform the three-dimensional human motion modeling. In this paper, the improved 3D human motion region image model is as follows: D(x, y, z, n) = I(x, y, z, n − 1) − 2I(x, y, z, n) + I(x, y, z, n + 1)
(1)
In the above formula, n represents the sequence number of the current image frame. I(x, y, z, n) represents the gray value of the pixel in the three-dimensional space located in (x, y, z). D(x, y, z, n) represents the change of pixel gray value between adjacent frames in human motion area.Γ is a specially selected threshold.
Xiaoying Zhang et al. / Procedia Computer Science 147 (2019) 198–202 Xiaoying Zhang et al. / Procedia Computer Science 00 (2019) 000–000
200
The three-dimensional motion region image calculation method of human motion is as follows: τ, D(x, y, z, n) > Γ Hτ (x, y, z, t) = max (0, Hτ (x, y, z, t − 1) − 1) , otherwise
3
(2)
In the above formula, Hτ (x, y, z, t) expresses the gray value of the pixel at a certain time t in the 3D motion region image located at (x, y, z). The higher the gray value in the human motion segmentation area corresponds to the closer human motion. As time goes by, the gray level of human motion trajectory will gradually decrease, which means the change of the gray value in the motion region image reflects the moving direction of motion. 3.2. Modeling and invariant feature construction of human motion In order to overcome the problem of sensitivity of observation points and weak robustness, the invariant moment model is introduced as the feature of 3D motion area image. To calculate the invariant moments of 3D human motion trajectory, we put the 3D motion region image of human motion into three mutually orthogonal projection planes. Then we can start to calculate the invariant moments of the three projection images. For an image f (x, y) of size M × N, its p + q order moment m pq is defined as: m pq =
M N
f (x, y)x p yq (p, q = 0, 1, 2...)
(3)
x=1 y=1
Seven invariant moments can be obtained by the linear combination of the second order normalized central moment and the third order normalized central moment. The seven invariant moments obtained by this way have three key properties: translation invariance, rotation invariance and scale invariance. Calculating the seven invariant moments of projected image in three projection directions by the above method, we can get a feature vector matrix of 3*7. The feature vector matrix is the feature vector of the 3D motion region image of human motion trajectory. These invariant moment features provide effective features to the RIHLM classifier, allowing the RIHLM classifier to achieve the accuracy and robustness of human motion recognition. 3.3. Collaborative interactive motion learning and recognition In order to learn the invariant moment features of human motion and realize the recognition of human motion, this paper proposes the RIHLM classifier. In order to study human limbs collaborative interaction learning and recognition process, we attempt to use traditional machine learning methods (such as BP network model, the model of support vector machine, deep belief network model). And this paper made reference to the EM-ELM proposed by Huang[6], and the B-ELM by Yang[7] to improve the robustness and real-time. This paper referred the incremental learning characteristics in real scene, and constructed the invariant features that are not sensitive to the change of occlusion position. Combined with the above two advantages, this paper proposes the RIHLM. 3.3.1. The theory of Random Increased Hybrid Learning Machine RIHLM is proposed on the basis of summarizing the deficiencies of Back Propagation(BP), Extreme Learning Machine(ELM) and the incremental Extreme Learning Machine(I-ELM) in human motion recognition. It has the following advantages:(1) The training time is short and the recognition accuracy is high.(2) Do not need to rely on experience to select the network initial parameters.(3) The optimal number of hidden layer nodes can be determined by means of self-growth.(4) There are few low-efficiency hidden layer nodes in the network.(5) The network increments multiple hidden layer nodes each time, and the self-growth is fast. The learning process of RIHLM is as follows: (1) For the even number of hidden layer nodes, RIHLM uses the network learning error back-propagation method, calculating the network connection parameters of partial hidden layer nodes through the network error feedback matrix of network growth each time; (2) For the odd number of hidden layer nodes, it uses the principle of Error Minimized Extreme Learning Machine Model, calculating updated network connection parameters by error minimization calculation.
4
Xiaoying Zhang et al. / Procedia Computer Science 147 (2019) 198–202 Xiaoying Zhang et al. / Procedia Computer Science 00 (2019) 000–000
201
Fig. 1. (a) stop; (b) straight; (c) left turn; (d) Lane change.
The network error of the RIHLM satisfies the monotone decreasing property, that is to say, the error of the network will gradually become smaller with the iteration. 3.3.2. The realization of the Random Increased Hybrid Learning Machine The detailed training algorithm for Random Increased Hybrid Learning Machine is shown below: Algorithm 1 The training algorithm for Random Increased Hybrid Learning Machine Input: The training sample set of N size, (x, t); The hidden layer neuron number, L = 0; The incentive function h(x); Output: Weight matrix β between the hidden layer nodes and the output layer nodes; 1: Set network initialization parameters for the artificial neural network, including: network training data sample set, xi ; training error η; initial output error e0 ; output error eL when the number of hidden layer nodes is L; output matrix HL ; incremental network iterations k = 0; the number of hidden layer nodes after the kth iteration Lk ; 2: While (Network error eLk > η) 3: If (Lk is odd) 4: Compute output matrix HLk = HLk−1 , Hr ; Update the output weight matrix DLk , U Lk , βLk ; 5: else if (Lk is even) 6: Calculate the error feedback matrix HLe ; Calculate the parameters of hidden layer neural nodes aLk , bLk ; Update k 7: 8: 9: 10:
the output weight matrix HLk ; Calculate the output weight matrix βLk = endif Calculate the network output error eLk ; endwhile END
eLk−1 ,HLk ; HLk 2
4. Experiments 4.1. The experiments design The intent of the experimental design is to compare experiments from several different perspectives: robustness, accuracy, and real-time performance. The experiment uses Microsoft structure light sensor Kinect to capture the 3D motion trajectory data of human motion. When collecting the 3D motion trajectory data of the human motion, the target human body needs to face forward and stand upright in the horizontal plane and 1.2 to 2 meters away from the structural light sensor. In this paper, four representative traffic command gestures were selected for comparative experiments, respectively, stop, straight, left turn and lane change. The 3D motion region image of each motion is shown in Figure 1. In the experiment, we collected 10 students’ limb movement trajectory data. Each type of limb movements in each of the students involved in the experiment should be repeated 20 times, as a total of 800 sets of 4 types of data samples. Among them, 400 groups were randomly selected as test data sets, and the rest were training data sets.
202
Xiaoying Zhang et al. / Procedia Computer Science 147 (2019) 198–202 Xiaoying Zhang et al. / Procedia Computer Science 00 (2019) 000–000
5
Fig. 2. (a) Recognition accuracy; (b) Training time.
4.2. Comparison of different motion recognition algorithms We do some contrast experiments with other common human motion recognition algorithms. As Figure 2 shows, there are no obvious differences between human motion recognition algorithms in accuracy. The Random Increased Hybrid Learning Machine rank first in recognition for motion 1, 2 and 3. The training time consumption of the Random Increased Hybrid Learning Machine(RIHLM) was significantly less than other machine learning algorithms. Analyzing from the two aspects above, we can draw the conclusion that the RIHLM algorithm proposed in this paper has better learning accuracy, real-time performance and robustness in the learning and recognition of human motion. 5. Conclusion In order to meet the needs of robustness, real-time and accuracy in natural human-computer interaction application scenarios for the learning and recognition of human limbs cooperative interaction, this paper proposes the human motion learning and recognition model based on Random Increased Hybrid Learning Machine. This paper combines theoretical derivation and experimental analysis, proving that the new model can overcome the shortcomings of traditional method, and it makes the learning and recognition process faster and more efficient, ensures the overall real-time performance of the system, while reducing the overhead generated by machine learning. Acknowledgements Thanks to the supported by The National Key Research and Development Program of China(Grant No. 2016YFB1001404). References [1] A. Nijholt, “Meetings, gatherings, and events in smart environments,” in Proceedings of the 2004 ACM SIGGRAPH international Conference on Virtual Reality Continuum and Its Applications in industry. ACM, 2004, pp. 229–232. [2] Y. T. Du, F. Chen, W. L. Xu, and Y. B. Li, “A survey on the vision-based human motion recognition,” Acta Electronica Sinica, vol. 35, no. 1, pp. 84–90, 2007. [3] T.-T. Ruan, M.-H. Yao, X.-Y. Qu, and Z.-W. Lou, “A survey of vision-based human motion analysis,” Computer Systems and Applications, vol. 20, no. 2, pp. 245–253, 2010. [4] J. Ohya and F. Kishino, “Human posture estimation from multiple images using genetic algorithm,” in Pattern Recognition, 1994. Vol. 1Conference A: Computer Vision & Image Processing., Proceedings of the 12th IAPR International Conference on, vol. 1. IEEE, 1994, pp. 750–753. [5] Z. Chang, X. Ban, Q. Shen, and J. Guo, “Research on three-dimensional motion history image model and extreme learning machine for human body movement trajectory recognition,” Mathematical Problems in Engineering,2015,(2015-5-27), vol. 2015, pp. 1–15, 2015. [6] Y. Lan, Y. C. Soh, and G. B. Huang, “Random search enhancement of error minimized extreme learning machine,” in Esann 2010, European Symposium on Artificial Neural Networks, Bruges, Belgium, April 28-30, 2010, Proceedings, 2012. [7] Y. Yang, “Research on extreme learning theory for system identification and application,” Changsha: Hunan University, 2013.