Robotics and Computer Integrated Manufacturing xxx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Robotics and Computer Integrated Manufacturing journal homepage: www.elsevier.com/locate/rcim
Full length Article
Automatized modeling of a human engineering simulation using Kinect Chanmo Juna,b, Ju Yeon Leea, Bo Hyun Kima, Sang Do Nohc,
⁎
a
IT Converged Process R&D Group, Korea Institute of Industrial Technology, 143, Hanggaulro, Sangnok-gu, Ansan, Gyeonggi-do 15588, Republic of Korea Department of Industrial Engineering, Sungkyunkwan University, 2066, Seobu-ro, Jangan-gu, Suwon, Gyeonggi-do 16419, Republic of Korea Department of Systems Management Engineering, College of Engineering, Sungkyunkwan University, 2066, Seobu-ro, Jangan-gu, Suwon, Gyeonggi-do 16419, Republic of Korea b c
A R T I C LE I N FO
A B S T R A C T
Keywords: Digital human Modeling Multi-Kinects Simulation
Many researchers of human engineering seek to enhance work efficiency and to reduce workloads by analyzing worker load, work efficiency, and production using analogue methods, such as checklists. Recent analyses have examined job performance using digital human models created by digitally recording the movements of actual workers through keyboard and mouse-based data input. However, this modeling method has two problems: it takes too much time to model all movements, and accuracy depends on the modeling technician. To solve these problems, this study presents a digital human modeling automation system using Kinect, a Microsoft Xbox data input device, to model human movements. The system is designed to utilize multiple Kinects, and the data model conveys and manages data from these devices to calibrate an algorithm that can be used to create a digital human model. Through this system, it is possible to easily generate a digital human model accurately, inexpensively, and efficiently. The developed digital human modeling automation system is verified using four scenarios, and the results, limitations, and development plan of the study are described.
1. Introduction As manufacturer competition increases and product life cycles become shorter, manufacturers are making diverse efforts to improve the efficiency of product manufacturing and to reduce costs. Optimization is focused not only on facilities but also on workers, who assemble products by using or operating these facilities. Human engineering attempts to enhance work efficiency and reduce workload by analyzing worker load, work efficiency, and production [1,2]. Previous analyses of human engineering have been carried out using analogue methods based on checklists, but the current analysis is based on a digital model [3], created by digitally recording the movements of actual workers. Previous digital human models were created manually, through keyboard and mouse-based data entry [4], which had two significant limitations: the time required to model entire movements and accuracy being dependent on the modeling technician. To solve these problems, an automated modeling method using a Vicon camera [5] has been suggested to quickly and accurately model movements, although the cost of the camera is high. Thus, studies on model automation have also used Kinect cameras, which are a low-cost gaming device [3] that is less accurate for modeling than the Vicon camera. In this study, to solve the problem of low accuracy when using one Kinect camera, several Kinect cameras were used for model automation.
⁎
A data model was suggested to combine data from multiple Kinect cameras, and a system to integrate skeleton data from the Kinect through a virtual reality peripheral network (VRPN) was devised [6]. The integrated data were input as basic information into Jack, a human engineering software package, to analyze diverse actions. The results were verified through a modeling test using the developed system. 2. Research background 2.1. Modeling automation Two methods for modeling worker movements using digital-environment information have been used: the active sensing method, which acquires movement information through sensors attached to workers’ bodies, and the passive sensing method, which uses a camera to analyze movements from a distance [7]. The active sensing method provides accurate information, but attaching sensors to subjects is time consuming. A typical active sensing device is the Xsens MVN made by Xsens Co [8], which accurately records movement using electromagnetic sensors attached to the whole body. The passive sensing approach records movement information using a camera, such as the Vicon, which uses two-dimensional images of a subject with markers attached to joints to record movement [5]. An improvement on this
Corresponding author. E-mail addresses:
[email protected],
[email protected] (C. Jun),
[email protected] (J.Y. Lee),
[email protected] (B.H. Kim),
[email protected] (S.D. Noh).
https://doi.org/10.1016/j.rcim.2018.03.014 Received 15 April 2017; Received in revised form 2 November 2017; Accepted 31 March 2018 0736-5845/ © 2018 Elsevier Ltd. All rights reserved.
Please cite this article as: Jun, C., Robotics and Computer Integrated Manufacturing (2018), https://doi.org/10.1016/j.rcim.2018.03.014
Robotics and Computer Integrated Manufacturing xxx (xxxx) xxx–xxx
C. Jun et al.
Fig. 1. Human tracking study using Kinect [10].
input devices, the VRPN client that integrates this information, and the application that expresses input information after integration. In this study, the experiment environment was created using multiple Kinect cameras and VRPN servers, and the skeleton data acquired from relevant servers were integrated in the VRPN client to present the data through a virtual reality application (i.e., Jack).
technology is Kinect, which uses depth sensors without any attachments to record movement [9]. 2.2. Ergonomics studies using Kinect Kinect is a console device for game play developed by Microsoft. Because of its low price and free API for joint extraction, it is widely used for human-engineering studies on tracking, pose estimation, and recognition. For example, Schönauer [10] studied continuous tracking of the movement of a subject in a certain space, as shown in Fig. 1. When several Kinect cameras were used to track the object, the data from the Kinects overlapped, as shown by the purple regions in Fig. 1. To solve this problem, Schönauer gave weight to each Kinect. Martin [11] used Kinect to measure worker loads in an educational situation and suggested a warning system to indicate loads that are too large. He proved the feasibility of Kinect devices for human engineering by comparing the results from Kinect to conventional human-engineering analyses using occupational safety and health administration (OSHA) and recommended weight limits (RWL). Martin pointed out a number of limitations to comprehensive use of Kinect in human-engineering studies, such as unrecognized joint movement when objects are hidden and the failure of the system when two Kinect cameras are linked together.
3. Multi-Kinect calibration system In this study, an automated modeling system was developed to carry out a human-engineering analysis using multiple Kinects and a VRPN methodology. To this end, this study proposes a skeleton data model as a framework for exchanging data throughout the system and a calibration algorithm to integrate the human skeleton input data from several Kinects. 3.1. Multi-Kinect calibration system architecture Fig. 3 shows the architecture of the system designed to receive the skeleton data from multiple Kinects, integrate these data, and send the results to the application. Similar to the architecture in Fig 2, the system is composed of the data input device (i.e., Kinect), the VRPN server, the VRPN client, and the application. The algorithm to integrate the data from multiple Kinects was applied to the VRPN client. Three Kinects were used, but the system was designed to be able to increase the number of Kinects, if necessary. Each Kinect was connected to one VRPN server, which sent skeleton data from the Kinect to the VRPN client through a TCP/IP network. Skeleton data from multiple devices were sent through the network and stored in the form of a pre-defined data model. The VRPN client module received data from each VRPN server, and the transferred data passed to the user after conversion through four modules. The calibration module converted the data received from several motion cameras into a single coordinate system, while the angle-check module calculated the angle between the subject and each camera to assign weights. The length-check module compared the length of each joint to the joint length in the initial setting process to determine whether or not there was an abnormality. Data from several Kinects were converted into one skeleton dataset through a data
2.3. VRPN The VRPN is a method suggested by Taylor [6] to realize virtual reality through the communication protocols of multiple devices. Diverse input devices, such as gloves, joysticks, and cameras, are used to achieve this virtual reality. This method was suggested to solve problems related to interference and event time while receiving data from devices; independent data-input environments for each device are connected through a network transmission control protocol/internet protocol (TCP/IP) to check the data times from different devices and to integrate data through a simple interface, enabling the use of the information in a virtual environment. Fig. 2 shows the simple architecture of the VRPN methodology, which is composed of input devices, such as a Kinect or joystick, the VRPN server that acquires information from
Fig. 2. VRPN architecture [6]. 2
Robotics and Computer Integrated Manufacturing xxx (xxxx) xxx–xxx
C. Jun et al.
Fig. 3. System architecture.
used for calibration.
combining module. The application transformation module converted the derived skeleton dataset into a data format that could be used in user applications. An inverse kinematics method was used to generate data suitable for Jack, a commercial ergonomic simulation tool. Each module and the skeleton data model are described in Sections 3.2 and 3.3.
3.3. Data calibration algorithm To determine the accuracy of the coordinate, this study used two validation methods. First, the distance values between joints were obtained during initial calibration of the system, and any measured distance longer or shorter than the relevant value in the real-time analysis was input as false. Second, the angle between the camera and object was measured. According to Obdrzalek [12], results contain significant errors when this angle is greater than 60°; therefore, when the angle was greater than 60°, a low weight was given as in the angle check module shown in Fig. 5. Then, a function was applied to integrate the skeleton data from multiple Kinects into one coordinate value, which was sent to the VRPN client through VRPN server and TCP/IP network. All values sent from the local coordinate system had to be integrated into local coordinate values for each Kinect to obtain a unified coordinate value. Fig. 5 shows the flow of the data calibration process. For each camera, chest points, including the right shoulder, left shoulder, and hip center, were on one plane; calibration was carried out by rotating each camera plane to the basic plane. To solve the problem of the Kinect camera being unable to distinguish the front of an object from the back during the simulation, the face recognition function of Kinect software development kit (SDK)was used during the initial stage of the simulation to recognize the object's face, and then, a normal vector, which is perpendicular to the plane containing both shoulder points and hip center, was created to allow the system to recognize the relevant position as the front of the subject. The following equation refers to the joint-specific position value to which the weight was applied:
3.2. Data model The most significant problem in integrating skeleton data from multiple Kinects is the need to synchronize the data for each joint. Data for 30 frames per second for each of the 20 joints were sent to the VRPN client, making it difficult to integrate the data into an accurate form without precise synchronization. In this study, the data models to store skeleton data from devices per unit time were suggested, as shown in Table 1, and were applied to the system. The Human_Skeleton class was designed as a list-type class to input information from multiple Kinects and was comprised of DeviceName, IP Address, Location, and Human_Joint_Data. Location refers to the information showing the position of a device and was calculated and input as absolute coordinates during the initial calibration to be used as basic data for later integration. Human_Joint_Data contain data-input time, angle, and validity information, as well as the Human_Joint_List, which includes JointType, Quaternion, Point, Distance, and Validity information that allows the coordinate position of each joint to be saved in a list. For JointType, 20 joint types were sent from the Kinect but were defined as enumerable; the actual coordinate points were shown in Skel_point. Distance provides information of the distance between the camera and object, while Validity confirmed the accuracy of the coordinate position. If the position was not correct, it presented as false and was not 3
Robotics and Computer Integrated Manufacturing xxx (xxxx) xxx–xxx
C. Jun et al.
Fig. 4. Architecture of Human_Skeleton, Human_Joint_Data, and Human_Joint_List classes. k
Pi =
simulation program. Data conversion, camera calibration, joint-length calculation, and the weighting process were all performed in the calibration module, and after calibration for each camera, one skeleton dataset was compiled, based on joint-weight and initial joint-length information.
win pn , k k i w ∑ n=1 j=1 i
∑
where i is the joint number, k is the Kinect camera number, Pi is weighted position value of joint i, win is the i joint weight value of the Kinect, and pin is the i joint position value of Kinect. The unified coordinate values were linked to commercial humanengineering software, for which the inverse kinematics methodology was adopted. Every three-dimensional point value for each joint was converted to a vector value indicating the distance from point and point, which was in turn converted into the rotation value for the Jack simulation model to visualize the data on Jack. The inverse kinematics methodology filtered invalid coordinate values before data were sent to the human-engineering simulation tool, which occurred in the application transformation module. Fig. 5 shows the process of acquisition of data from the Kinects to integration of data by the human-engineering
4. Case study The movements of automobile manufacturing workers, taken in two-minute intervals, were used to verify this research, based on Joung's methodology [13]. The movement of a manufacturing worker in Joung's study [13] included four types of actions: stop, rotation, position changing, and arm movement. The relevant movements were modeled manually using the conventional methodology and automatically through the proposed system. The results were compared with respect to time and accuracy. To calculate the accuracy of the
Table 1 Skeleton data class. Level 0
Level 1
Level 2
Level 3
Type
JointType Quaternion Skel_point Distance Validity
List<> String String double[] List<> String double boolean List<> JointType double[] double[] double boolean
Human_Skeleton DeviceName DeviceAddress DeviceLocation Human_Joint_Data Time Angle Validity Human_Joint_List
4
Robotics and Computer Integrated Manufacturing xxx (xxxx) xxx–xxx
C. Jun et al.
and one Kinect camera. Two minutes of movement required five minutes of modeling. Computer analysis was conducted but did not distinguish movement from arm and hand motion during modeling, as in Scenario 1. Because one Kinect was used, there was a modeling error when motion was blocked by other parts of the body, and when a modeling error occurred, movement was slowed down or disconnected. The results of the simulation were compared to the simulation results from Scenario 1; 40% of the analysis results were different, and analysis accuracy was calculated at 60%. 4.3. Scenario 3: simulation using the multiple Kinect cameras In Scenario 3, three Kinect cameras were arranged, and the distances between the cameras were 4 m, 4 m, and 3.5 m. Modeling time was five minutes, and automation precision was the same as in Scenario 1. Scenario 3 showed very similar movement compared to that for the traditional human-engineering simulation model, which means that this methodology is appropriate for use in future human-engineering studies. The results of the REBA for Scenario 3 were also the same as those for Scenario 1, and accuracy was calculated at 100%. Scenarios 2 and 3 exhibited improved simulation times by 900 s (75%) compared to Scenario 1, which used the conventional method. For Scenario 2, motion was difficult to analyze because of the angles between the subject and camera during movement, which hid parts of the subject from the camera; thus, precision was 60% less than that for Scenario 1. For Scenario 3, three Kinect cameras were used, and precision was 100%, which was the same as the precision for Scenario 1.
Fig. 5. Data calibration process flow. Table 2 Case study scenario results. Situation
5. Conclusion
Results Time(sec)
Scenario 1 Scenario 2 Scenario 3
Human-engineering simulation reduces product development costs and improves quality by supporting decision making during manufacturing processes related to safety, working environment, and interference with products. To conduct a human-engineering simulation, information about production is required, such as the physical environment of the plant and the movement of workers during production. Compiling movement information of workers has been a significant limitation to using human-engineering simulation because it requires extensive attention to detail and is time consuming to implement. There have been diverse auto-modeling techniques using joint markers and Vicon cameras developed; however, such techniques are difficult for companies to use because of space limitations and related expenses. In this study, an automated method to model workers’ movements was suggested for human-engineering simulation. The model consisted of a VRPN server that captures workers’ movement information from multiple Kinects, a VRPN client module that receives movement information through a TCP/IP network, a checker module that determines whether the movement information is correct, a calibration module that integrates datasets from multiple Kinects, and a conversion module that converts movement information into a Jack simulation. To verify the program developed in this study, the workflow and results of the conventional system were compared to those of the proposed system. The proposed model saves time and costs when performing worker modeling, which have been problematic issues for previous human-engineering simulation methodologies.
Accuracy
Total
Movement
Modeling
1200 s 300 s 300 s
300 s – –
900 s – –
● (100%) (60%) ● (100%)
model, the rapid entire body assessment (REBA) [14] was used to compare traditional human-engineering results to the scenario results. Scenario 1 used an existing ergonomic simulation modeling methodology, and after video was taken, analysts modeled the data and input the data using a computer mouse. Scenarios 2 and 3 verified modeling accuracy based on the number of Kinect cameras; Scenario 2 was modeled using a single Kinect camera, and Scenario 3 was modeled using three Kinect cameras. The results of each scenario are shown in Table 2 and detailed descriptions are given in Sections 4.1–4.3. 4.1. Scenario 1: conventional human-engineering simulation Scenario 1 used a traditional human-engineering simulation methodology and ergonomic simulation software to model subject motion captured by video camera using a computer mouse. It took 20 minutes to model two minutes of movement, with arm and hand movements taking longer to model than location movement due to the detail of such movements requiring multiple mouse clicks and subjects being hidden from the camera. If arm and hand movements are outside the values of the current example, more time will be required for modeling. The accuracy of Scenarios 2 and 3 was calculated based on the modeling data from Scenario 1; each model dataset was analyzed using the REBA methodology, and the results were compared.
Acknowledgments This work was supported by the WC300 Project (S2367439, Smart Rail & Smart Manufacturing System Development) funded by the Ministry of SMEs and Startups, and the Technology Innovation Program (10080296, Development of Advanced Operation Management System for Smart Factory Based on Clean Energy), funded by the Ministry of Trade, Industry & Energy, Republic of Korea. These supports are gratefully acknowledged.
4.2. Scenario 2: simulation using one Kinect camera In Scenario 2, modeling was performed using the developed system 5
Robotics and Computer Integrated Manufacturing xxx (xxxx) xxx–xxx
C. Jun et al.
References
[9] J.H. Lee, Advanced Human Body Tracking Method Using Multiple Kinect Sensors, Sejong University, 2014. [10] C. Schönauer, H. Kaufmann, Wide area motion tracking using consumer hardware, Proceedings of the ACM Advances in Computer Entertainment Technology Conference (ACE 2011), Lisbon, Portugal, 2011, pp. 57–65. [11] C.C. Martin, D.C. Burkert, K.R. Choi, N.B. Wieczorek, P.M. McGregor, R.A. Herrmann, P.A. Beling, A real-time ergonomic monitoring system using the Microsoft Kinect, Proceedings of the Systems and Information Design Symposium (SIEDS), 2012 IEEE, Charlottesville, USA, 2012, pp. 50–55. [12] Š. Obdržálek, G. Kurillo, F. Ofli, R. Bajcsy, E. Seto, H. Jimison, M. Pavel, Accuracy and robustness of Kinect pose estimation in the context of coaching of elderly population, Proceedings of the Engineering in Medicine and Biology Society (EMBC) 2012 Annual International Conference of the IEEE, San Diego, USA, 2012, pp. 1188–1193. [13] Y.-K. Joung, S.D. Noh, Integrated modeling and simulation with in-line motion captures for automated ergonomic analysis in product lifecycle management, Concurr. Eng. 22 (2014) 218–233. [14] S. Hignett, L. McAtamney, Rapid entire body assessment (REBA), Appl. Ergon. 31 (2000) 201–205.
[1] S. Pheasant, C.M. Haslegrave, Bodyspace: Anthropometry, Ergonomics and the Design of Work, CRC Press, 2016. [2] N.A. Stanton, M.S. Young, C. Harvey, Guide to Methodology in Ergonomics: Designing for Human Use, CRC Press, 2014. [3] D. Regazzoni, C. Rizzi, Digital human models and virtual ergonomics to improve maintainability, Comput. Aided Des. Appl. 11 (2014) 10–19. [4] H. Demirel, V. Duffy, Applications of digital human modeling in industry, Digit. Hum. Model. 4561 (2007) 824–832. [5] V. Peak, Vicon Motion Capture System, Publisher, Lake Forest, CA, 2005. [6] R.M. Taylor II, T.C. Hudson, A. Seeger, H. Weber, J. Juliano, A.T. Helser, VRPN: a device-independent, network-transparent VR peripheral system, Proceedings of the ACM Symposium on Virtual Reality Software and Technology, Alberta, Canada, 2001, pp. 55–61. [7] T.B. Moeslund, A. Hilton, V. Krüger, A survey of advances in vision-based human motion capture and analysis, Comput. Vis. Image Underst. 104 (2006) 90–126. [8] D. Roetenberg, H. Luinge, P. Slycke, Xsens MVN: Full 6DOF human motion tracking using miniature inertial sensors, Xsens Motion Technologies BV, (2009) Tech. Rep.white paper www.xsens.com/images/stories/PDF/MVN_white_paper.pdf.
6