ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 56–65
Contents lists available at ScienceDirect
ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs
Automatic techniques for 3D reconstruction of critical workplace body postures from range imaging data Patrick Westfeld a,⇑, Hans-Gerd Maas a, Oliver Bringmann b, Daniel Gröllich c, Martin Schmauder c a
Institute of Photogrammetry and Remote Sensing, Technische Universität Dresden, D-01062 Dresden, Germany kubit GmbH, Tiergartenstraße 79, D-01219 Dresden, Germany c Institute of Material Handling and Industrial Engineering, Technische Universität Dresden, D-01062 Dresden, Germany b
a r t i c l e
i n f o
Article history: Received 17 January 2013 Received in revised form 15 August 2013 Accepted 15 August 2013 Available online 13 September 2013 Keywords: Range camera Least squares tracking Potential energy model surface fit Human motion CAD manikin Awkward body postures
a b s t r a c t The paper shows techniques for the determination of structured motion parameters from range camera image sequences. The core contribution of the work presented here is the development of an integrated least squares 3D tracking approach based on amplitude and range image sequences to calculate dense 3D motion vector fields. Geometric primitives of a human body model are fitted to time series of range camera point clouds using these vector fields as additional information. Body poses and motion information for individual body parts are derived from the model fit. On the basis of these pose and motion parameters, critical body postures are detected. The primary aim of the study is to automate ergonomic studies for risk assessments regulated by law, identifying harmful movements and awkward body postures in a workplace. Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved.
1. Introduction In many jobs, work life is full of awkward postures. This may be everything between overhead work, bending, squatting or kneeling to a combination of critical postures of the head, torso and extremities (Fig. 1). In Germany, in 2012 around 4.8 million employees worked in unfavourable, forced postures (Brenscheidt et al., 2012). With a share of 24.4%, musculoskeletal disorders cause most of the days of incapacity to work, and with 9.1 billion euro loss of production, they form a great entrepreneurial potential for prevention (Brenscheidt et al., 2012). In order to identify and prevent or at least reduce harmful effects on human health, ergonomic workplace and risk analyses are required by German and European law (Machinery Directive 2006/42/EC; German Occupational Safety and Health Act). Risk analyses are usually realised by well-trained experts who interactively analyse field observation data surveyed (e.g. Ellegast, 2005; Schmitter, 2005; Hückstädt et al., 2007). Taking aspects of efficiency and objectiveness into account, the data-intensive documentation of body postures in real work processes and the corresponding data evaluation present a challenge. So far, so called paper-and-pencil tests or sometimes video-aided methods are only gradually being replaced by automated measuring techniques. ⇑ Corresponding author. Tel.: +49 35146339701; fax: +49 35146337266. E-mail address:
[email protected] (P. Westfeld). URL: http://photo.geo.tu-dresden.de/ (P. Westfeld).
The non-commercial measurement system CUELA is capable to document specific working situations using special sensors fixed on the clothing of the proband (Ellegast, 1998). Body and joint movements are detected by means of inertial sensors and potentiometers. With a total weight of several kilos, CUELA may form a severe impairment to the process to be analysed. Very high personnel and time efforts are further disadvantageous. Due to their complexity, the method is also only accessible to professionals in the fields of ergonomics or occupational medicine. The analysis of stereoscopic image sequences of multiocular camera systems allows a 3D reconstruction of movements and provides a basis for detailed motion analyses. Automatic image sequence processing methods can be used to improve both efficiency and objectivity of videography data analysis. Under certain conditions, real-time processing can be achieved (Maas, 1997). However, the spatiotemporal image matching can have high complexity with not always error-free results. Commercially available motion capturing systems make use of active or passive targeting e.g. with retro-reflective materials, but come with the extra effort of attaching targets to a proband and installing and calibrating a multi-camera system. Targeting may also affect and restrict the natural flow of movement in a real working environment. The US corporation Motion Analysis Corporation (2012) or the German company SIMI Reality Motion Systems GmbH (2013) distribute such optical measuring systems commercially.
0924-2716/$ - see front matter Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.isprsjprs.2013.08.004
57
P. Westfeld et al. / ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 56–65
(a)
(b)
(c)
Fig. 1. Awkward postures: (a) Working overhead with the arms raised. (b) Kneeling. (c) Repetitive lifting and moving of heavy loads.
This publication contributes to the automation of risk analyses, in particular in terms of increasing efficiency and objectiveness as well as accuracy and reliability. Further characteristics of the automatic techniques for 3D reconstruction of critical body postures in workplaces developed here are the demand on non-contact surveys, the reduction of observational influences, the relief of human resources and the increase of the spatiotemporal level of detail. There are a large number of different methods for ergonomic evaluation and assessment. Section 2 investigates which method is suitable to be automated. The required model to mathematically describe a human motion sequence is introduced in Section 3. Delivering spatially resolved surface data at video rate at a limited instrumental effort, range cameras represent an interesting alternative for data acquisition in the field of 3D motion analysis (Section 4). Section 5 describes a combination of bottom-up and topdown approaches to calculate consecutive parameter sets of an articulated human model and to map arbitrary postures from range camera image sequences. In a first instance, motion vectors are calculated by 2.5D least squares tracking for all 3D body points. This motion data provide a valuable basis for the following potential energy surface model fit. Finally, a risk model is defined to assess and document the entire body posture resp. single body parts (Section 6). 2. Methods assessing hazards at work Ergonomic studies are regularly carried out in labour sciences to improve safety and ergonomics in workplace (EU, 1997). In this context observational methods are used to identify hazards at work like awkward postures, forceful exertion or potentially harmful repetitive movements. Common methods for risk assessment are e. g. OWAS (Ovako Working Posture Assessment System; Karhu et al., 1977; Kivi and Mattila, 1991) or RULA (Rapid Upper Limb Assessment; McAtamney and Corlett, 1993) for awkward body posture monitoring or Kilbom’s (1994) guidelines to control repetitive work tasks. OWAS allows the evaluation of static and dynamic postures. It can be applied at mobile and stationary workplaces. Besides the weight of the load handled, OWAS is based up on the identification of the most common work postures for the back, arms and legs (Fig. 2), and their duration. The spatial position of body parts and the length of stay are criteria which can be captured well by an optical measurement system. In contrast, other methods for risk assessment include additional parameters like recovery times,
job rotation or state of health. OWAS is therefore particularly suitable to be automated in a vision-based approach. Solely the body posture description is qualitative, e. g. ‘one elbow on or above shoulder height’. The automatic procedure for 3D reconstruction of awkward body postures in workplace developed within this contribution requires quantifying evaluation rules. A criteria catalogue established defines metric domains for each basic OWAS working posture as joint angles and anatomically relevant body points (e. g. ‘flexion of the upper body when standing >10° starting from vertical body centre corresponds to basic working posture Nº2’; Fig. 2(c)). Comfort levels as well as extreme joint angle limits are described in an additional risk model (Section 6). 3. Simplified CAD manikin A 3D human model describes the human locomotive system concerning its proportions, topology and the repertoire of movements. The manikin used here is based on CharAT Ergonomics (Kamusella and Schmauder, 2009; Ördögh et al., 2011), a Autodesk Ò 3ds Max plugin for ergonomic simulations developed by Virtual Human Engineering GmbH (Stuttgart, Germany) in cooperation with the professorship of Labour Sciences (TU Dresden, Germany). The individual body parts of the human model are available as 3D meshes (Fig. 3). The vectorized CAD surfaces can then be sampled on a regular grid into 3D Cartesian point cloud coordinates. A local 3D coordinate system is defined for each body segment s with its origin Xs0 ðX; Y; ZÞ at the main pivot point of the joint (Fig. 3). Those local coordinate systems are closely linked to the relevant limbs, which means that all Xs0 are constants. The contortion of the limbs can be described by relative rotations between the local coordinate systems using three varying Euler angles (x, u, j)s. Furthermore, the current location of one body point, for example a hip point Xhip, is required in global coordinates. The human model is scaled globally on the basis of a first initial frame due to the assumption of constant proportions. In detail, the formal description of a human model M can be as follows:
0 B B B M¼B B B @
1
Fixed parameters Xs0 Varying parameters ðx; u; jÞs Ò
Body part representation in AutoCAD Strategy for parameter variation Global constraints and additional information
C C C C C C A
ð1Þ
58
P. Westfeld et al. / ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 56–65
(a)
(b)
(c)
(d)
(e)
Fig. 2. Some working postures in the category back: (a) Upstanding position. (b) Extension. (c) Flexion. (d) Inclination. (e) Rotation. The red line defines the basic and the green one the actual position in the corresponding view. The circles mark anatomically relevant body points. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3. Body segments of a simplified meshed CAD manikin in local 3D coordinate system: (a) Head. (b) Torso. (c) Pelvis. (d) Arms. (e) Legs.
Any posture within the repertoire of movement is described mathematically by this consistent set of parameters. Parameter variation is carried out using the optimisation method presented in (Section 5.4), taking anatomical constraints like joint angle limits or inconsistent extremity intersections as well as additional motion information into consideration. In detail, strategy for parameter variation contains information on how the optimisation method can vary the human model parameters. A co-domain is defined for every parameter. For example, angles between two single body parts can only have values in a specific range, whereas some joints (e. g. upper arm to torso) can have a full range of 360°. In addition to the co-domain a small incremental value is stored, which is later used for parameter variations. The most important global constraint
(a)
is to avoid overlapping of body parts. Those constraints are tested after every optimisation step, and the last parameter variation is rejected if one constraint is not fulfilled. Additional information are included in terms of displacement vectors (Section 5.3). The human model is recalculated and duplicated in the CAD environment on the basis of the resulting set of parameters.
4. Sensor The use of modulation techniques enables time-of-flight range imaging cameras (ToF, RIM, 3D camera; Fig. 4(a); Schwarte et al., 1999) based on photonic mixer devices (PMD; Spirig et al., 1995;
(b) Ò
Fig. 4. Range camera workplace monitoring: (a) All data in this article are capture by a range camera PMD[vision] CamCube 2.0 (PMDTec, 2010). (b) A high risk working posture is acquired as 3D point cloud with colour-coded depth information overlay. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
P. Westfeld et al. / ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 56–65
Schwarte, 1996) or similar principles to simultaneously produce both an amplitude image and a range image. A modulated near infrared signal emitted by light diodes (LED) illuminates the scene. The invisible light is reflected back from the object surface to the camera’s CCD/CMOS sensor, where a distance-based charge carrier separation can be performed pixelwise. Each pixel thus becomes an electro-optical distance-measuring instrument according to the phase difference principle. Range cameras deliver spatially resolved surface data at video rate without the need for stereo matching (Fig. 4(b)). In the case of 3D motion analyses in particular, this leads to considerable reductions in system complexity and computing time. Range cameras combine the flexibility of a digital camera with the 3D data acquisition potential of conventional surface measurement systems. Despite the relatively low spatial resolution currently achievable, as a mono-sensorial real-time depth image acquisition system they represent an interesting alternative in the field of 3D motion analysis. Range cameras may be suitable for many applications in human motion analysis. Several publications describe the detection and estimation of gestures in range camera image sequences (Breuer et al., 2007; Holte et al., 2010; Westfeld, 2012). Westfeld (2007),Westfeld (2012) and Hempel and Westfeld (2009) determine body orientations and inter-personal distances between two probands for human behaviour analyses. Meers and Ward (2008) determine head positions in real time, and Jensen et al. (2009) investigate human motion on a treadmill. In (Diraco et al., 2010) a system for senior people fall detection in the home environment is presented. Breidt et al. (2010) model faces from range camera data. Also, the capabilities of human–computerinteraction (HCI) are expanded using RIM sensors (Lahamy and Lichti, 2012). The use of range cameras as precise measuring instruments requires the modelling of deviations from the ideal projection model. Sensor modelling is performed here using a flexible self-calibration routine proposed in (Westfeld, 2012; Westfeld and Maas, inpress). The calibration routine facilitates the definition of precise range camera interior orientation. It further allows the estimation of range-measurement-specific correction parameters required for the modelling of the linear, cyclical and latency defectives of a distance measurement made using a RIM camera. The integrated calibration approach jointly adjusts appropriate observations across both information channels (amplitude and range image), and also automatically estimates optimum observation weights using variance component estimation techniques. The method does not require spatial object information and therefore circumvents the time-consuming determination of reference distances with superior accuracy. An accuracy of 5 mm can be reached for a 3D point coordinate after calibration.
5. Range camera image sequence analysis In the following, automatic techniques for 3D reconstruction of body postures from RIM data are described. The structure of this section corresponds to the workflow implemented: After a brief description of the data set used for method development (Section 5.1), the person of interest is segmented in order to simplify the analysis (Section 5.2). On the basis of amplitude and range information, 3D motion vectors are calculated for all body points derived from image segmentation (Section 5.3). Afterwards, 3D RIM point clouds are mapped to an articulated human model at each time step (Section 5.4), at which the segmentation information as well as the dense motion vectors provide the basis for an initial orientation and a continuous approximation model.
59
5.1. Data set The typical repetitive movement of a warehouse worker provides the data used for developing methods. In Fig. 5, some amplitude and colour-coded range images of a sequence captured by a Ò range camera PMD[vision] CamCube 2.0 (Fig. 4(a)) are plotted. The data are pre-processed using the integrated calibration routine mentioned in Section 4 to correct for optical and distance errors, the input data and the measured slope distances are reduced to horizontal distances. 3D Cartesian object point coordinates are subsequently determined from polar measurements using the corrected image coordinate vectors and slope distances. 5.2. Segmentation of person of interest A straightforward analysis of both amplitude and range image is performed to segment all pixels belonging to the person of interest. In detail, the processing chain consists of the following image analysis cues: Motion Variance: Calculate the empirical variances of each amplitude resp. range pixel over time, and segment the resulting histograms dynamically into regions where significant motions occur. Apply morphological operators to remove small holes in the resulting binary image. Dynamically Threshold: Separate the scene’s foreground, person of interest and background by curve sketching of a Gauss best fit to range image histograms. Region Growing: Define an initial set of background seed points. Add adjacent points depending on the differences between pixel’s value and the region’s mean and iteratively grow the regions. Edge Detection: Find edges in range images with a Laplacian of Gaussian filter (LoG), label connected components in the resulting binary image, and connect edge segments to a closed object. Each of the steps above may deliver suitable segmentation information on its own. But to increase accuracy and reliability of the procedure, the binary results were fused by multiplication to a single binary image sequence with a true-coded person of interest (Fig. 6). 5.3. Motion vector field Motion vectors can be estimated for each image point x0 (x0 , y0 ) segmented in Section 5.2 by area-based matching between two consecutive epochs. 2.5D least squares tracking (LST) is an integrated matching technique, which has been enhanced for the purpose of evaluating RIM data. The algorithm is based on the least squares image matching method (LSM; Ackermann, 1984; Grün, 1985; Förstner, 1986), and maps small surface segments of consecutive range camera data sets on top of one another. The mapping rule has been adapted to the data structure of a range camera on the basis of a 2D 1st order polynomial transformation. In addition to the usual affine transformation parameters used to include translation and rotation effects (a0,b0 resp. a2, b1), the scale and inclination parameters (a1, b2 resp. a3, b3) model perspective-related deviations caused by distance changes in the line of sight. The following formula describes the transformation between two patches in consecutive frames:
x02 ¼ x01 þ Dx0 ¼ a0 þ a1 x01 þ a2 y01 þ a3 x01 y01 y02 ¼ y01 þ Dy0 ¼ b0 þ b1 x01 þ b2 y1 þ b3 x01 y01
ð2Þ
A closed parameterization combines amplitude values and range values in one integrated functional model. The geometric (Eq. (2))
60
P. Westfeld et al. / ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 56–65
Fig. 5. Simulated repetitive movement of a warehouse worker captured by a range camera: Carriage of goods from the floor into a storage rack and back.
Fig. 6. Binary segmentation results with black-coded pixel of interest.
and radiometric (offset and gain parameters r0,r1) relations between a template patch g A1 and a search patch g A2 , taken from consecutive amplitude images, lead to the first group of observation equations. It can be stated at each position x0 of sufficiently small image patches, taking some noise fraction eA into account:
g A1 ðx0 ; y0 Þ eA ðx0 ; y0 Þ ¼ r 0 þ r 1 g A2 ðx0 ; y0 Þ
ð3Þ
The formulation of one observation equation per pixel for the corresponding horizontal range image patches g D1 ? and g D2 ? is done analogously:
g D1 ? ðx0 ; y0 Þ eD? ðx0 ; y0 Þ ¼ d0 þ d1 g D2 ? ðx0 ; y0 Þ
ð4Þ
Similar to r0 (additive brightness term) and r1 (multiplicative contrast term) as parameters of a amplitude value adjustment (Eq. (3)), d0 and d1 consider differences between g D1 ? and g D2 ? . Distance changes between object and sensor as well as surface deformation exert influence on the affine scale parameters (a1, b2). Assuming no significant deformations k ¼ 12 ða1 þ b2 Þ due to small image patches and a high sensor frame rate, range displacements occurring in the depth direction can be expressed directly as a function of (a1, b2):
d0 ¼ f ða1 ; b2 Þ ¼ g D1 ? ðx0 ; y0 Þ g D2 ? ðx0 ; y0 Þ ¼ g D2 ? x0C ; y0C ðk 1Þ with x0C as centre pixel of a patch
ð5Þ
Patch rotation effects in depth direction caused by object tilts can be modelled using the additional 1st order polynomial transformation parameters (a3, b3):
d1 ¼ f ðx0 ; a3 ; b3 Þ ¼
1
y0 x0 a3 1 b3 2 2
ð6Þ
A more detailed derivation of the functional model of 2.5D LST is given in (Westfeld and Hempel, 2008; Westfeld, 2012). 2.5D LST is formulated as an iterative least squares adjustment procedure and minimises the sum of the squares of the amplitude and range value differences. The observation Eqs. (3) and (4) are non-linear. The linearised equations require initial values, obtained by hierarchically applying the technique on a resolution pyramid. Parameters that turned out insignificant in a significance test with a probability of 95% were excluded from the transformation. A fully
non-significant set of parameters is interpreted as a statistically proven identification of motionless regions. The complementary characteristics of the observations (amplitude and range) make them support each other due to the creation of a functional context for the measurement channels. This leads to an increase in accuracy and reliability (Westfeld, 2012). Additionally, 2.5D LST computes adjusted weights s2A and s2D? (a-priori variances) for each measurement data type and builds up the covariance matrix Rll of the observations by iterative variance component estimation (Westfeld and Hempel, 2008), thus assuring that the heterogeneous information pool is fully exploited. The result of 2.5D LST applied to a short sample sequence (30 s, 20 fps; Section 5.1) is a dense 3D motion vector field with fully three-dimensional displacement vectors DX(DX, DY, DZ) (Figs. 7 and 8). Overall, 1.25 million displacement vectors were determined. Outliers in the results, for instance caused by scattering and multi path effects at surface edges, were removed in an outlier detection procedure based on the following criteria: Convergence Behaviour: In general, 2.5D LST converges in a few iterations. Matches with a diverging or oscillating solution were rejected. Dynamic Thresholding: Translation vector lengths, a-posteriori variances of the observations as well as a-posteriori variances of the transformation parameters exceeding a specific threshold were eliminated. The thresholds were determined dynamically by applying the preset sigma rules to each mean deviation distribution. Neighbourhood Consistency: The differences of the transformation parameters between neighbouring patches were analysed. Surface points with deviations from their neighbourhood exceeding an automatically set limit following the 2-sigma rule were eliminated. The standard deviations of unit weights produced by the least squares adjustment process, averaged over all accepted image patches, are 4.09 grey values for the 16-bit amplitude channel and 4.99 mm for the range channel. The root mean square of the a-posteriori standard deviations ^sa0 ;b0 of the translation parameters (a0, b0) is 1/24 px. The mean a-posteriori standard deviation ^sd0 of
P. Westfeld et al. / ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 56–65
61
Fig. 7. Colour-coded vector lengths determined by 2.5D LST illustrate the range of motion between two consecutive frames. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
(a)
(b)
(c)
Fig. 8. Colour-coded 3D motion vector field representation: Lifting (a), placing (b) and holding (c) of objects are typical awkward movements of warehousemen. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
the additive range correction term d0 is 0.40% of the measured horizontal distance. A single object point movement DX(X, Y, Z) can thus be determined with an internal precision in the order of ^sX;Y ¼ 0:4 mm and ^sZ ¼ 7:0 mm at an average distance of 2.8 m. Note that these errors are correlated between consecutive time frames, and that accelerations derived from these motion vectors may thus show a better precision. 5.4. Potential energy surface model fit The human model M (Section 3) comprises several body segments s and can be described by a set of parameters (X0, x, u, j)s. Mt is best fitted to all points Xt(X, Y, Z) of a range camera point cloud at a certain time instance t (Fig. 9). M’s new alignment at t + 1 can then be determined iteratively by minimising the sum of the
squares of the Euclidean distances between the model surface and relevant points of the cloud. Relevance means that only points which are parts of the real world body are taken into account. The potential energy surface model fit, which is being used here, bases on a 2D image analysis algorithm for road extraction from satellite images using vector reference data (Prechtel and Bringmann, 1998). It allows a fast and robust determination of an optimisation criterium X and works as follows: 1. Define a voxel space with an adequate dimension and resolumm tion (typically 10 million voxel, 10:0 voxel ). 2. Build a potential energy surface Wt+1 = W(Xt+1) from the current range camera point cloud by filling up the voxel space. Voxels at valid point positions obtain the initial value 0, the remainders obtain 1. 3. The resulting potential energy surface is smoothed by linear distance transforms on the voxel grid (Rosenfeld and Pfaltz, 1968; Fig. 10). 4. Sample the previous model Mt and grip the corresponding voxel space values at each sampling point XMt(X, Y, Z). The sum of those values yields a new X for all body points i:
X¼
X
Wtþ1 ðXMti Þ ! min
ð7Þ
i
5. Repeat parameter variations until X reaches a minimum, thus the set of parameters of Mt+1 is optimally adjusted to Xt+1.
Fig. 9. CAD manikin to range camera point cloud adjustment by potential energy surface model fit between two consecutive epochs t and t + 1.
Note that at this stage, the potential energy surface Wt+1 = W(DXt+1) is calculated for the entire body surface. A separation between the single body segments shown in Fig. 3 is given implicitly
62
P. Westfeld et al. / ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 56–65
(a)
(b)
Fig. 10. Potential energy surface representation: (a) Range camera point cloud of the upper part of the body and one layer of the potential energy surface with linear decaying zero values (white) at valid point positions. (b) Three layers of a section through the potential energy surface voxel space.
Ò
by the body part representation in AutoCAD , thus by the translation and rotation parameters (X0; x, u, j)s of each meshed CAD manikin body segment s itself (Eq. (1), Section 3). The model fit requires approximate values for the unknown set of parameters to map an arbitrary posture MðXhip 0 ; x; u; jÞ. The parameters are initialised on the basis of a pre-defined starting position. Due to a relatively high sensor frame rate and a steady flow of motion, a simple gradient approach can be used as approximation procedure. Alternatively, motion information derived from 2.5D LST vector field discrepancies may provide high quality approximate values (Section 5.3). If additional information channels are available, the model fit can be extended to higher dimensional potential energy surfaces. A second potential energy surface Ut+1 = U(DXt+1) can be built up by using the 3D displacement vector field DX, which quantifies the dimension and the direction of motion at each body point (Section 5.3). Adding further summands is easily possible, improves accuracy and reliability of the adjustment, and leads to a nD potential energy surface model. Those summands can for example accomplish single models to fit single body parts, like Kt+1 = K(Xt+1) for all sampling points XHt of a single human head model Ht as a subset of Mt. In contrast to a full body scan, intense motions of proband’s extremities are contained. The extended optimisation criteria X becomes
X¼
X
X
i
i
Wtþ1 ðXMti Þ þ
Utþ1 ðDXMti Þ þ
X Ktþ1 ðXHtj Þ þ
ð8Þ
j
at which i denotes the index of all body points and j denotes the index of the head segment points only.
Critical workplace body postures are characterised by specific angles between single body parts (Section 2). The resulting set of parameters determined by optimisation correspond to these angles and describe arbitrary postures of the human locomotive system at certain time steps (Fig. 11). On the basis of these pose and motion parameters, the joint positions taken are evaluated within a risk model and can be easily classified as critical or not by considering ergonomic aspects (Section 6). 6. Risk model results The risk model developed bases on the OWAS method (Section 2). It includes posture-specific value ranges for comfort levels and joint angle limits, which state the required valuation rules. The elemental anatomical degrees of freedom and joint angle limits are derived from OWAS and transferred into the risk model. Time-resolved ergonomics data for analysing the risk of static and dynamic postures adopted are also introduced. On the basis of these data, the risk model assesses the time-resolved parameter variations of the human model fitted in Section 5.4 by a comparison with any arbitrary posture combination stored and identifies critical workplace body postures. Two critical postures of the simulated repetitive movement of a warehouse worker are shown in Fig. 12. The corresponding CAD manikins (Fig. 11) were modelled using the parameters determined in the course of the automatic 3D reconstruction (Section 5). The risk assessment evaluates the whole body posture at each time step, composed of the single postures of each body part category (back, arms, legs and head) as well as the load handled.
Fig. 11. Result of the potential energy surface model fit: Two range camera point clouds with their best-fitted human models exemplarily show lifting and holding as two critical body postures.
P. Westfeld et al. / ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 56–65
63
Fig. 12. Two critical postures, combined by single body part postures, exemplify the results of the automatic 3D reconstruction of the repetitive movement of a warehouse worker.
Fig. 13. Analysis tool for posture assessment.
64
P. Westfeld et al. / ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 56–65
Specifications can be given on how long single body part postures are adopted during the image sequence, which allows a final assessment. In accordance to the OWAS method, the work movement correspond to Nº2 of a four-parted action category: ‘The adopted body postures can cause high strains on the musculoskeletal system. Measures should be taken in near feature’. The development of a graphical user interface (GUI) allows those people actually involved in practical work to interact intuitively with the automatically generated data. The adopted posture of the entire body, represented by its single body parts, as well as the duration of a certain position are analysed, and the measures to be taken are given as a four-parted traffic light scheme. In detail, the GUI of the posture assessment tool is separated into four regions (Fig. 13): Regions A and B differ between static and dynamic body part postures; region C contains their percentage distributions within a pre-defined interval. The time slices of the body part postures are shown as pie charts in region D. After a first operational test, the results of the method developed are compared with those of an interactive evaluation. The amplitude channel of the sample data set captured was analysed manually on screen through a traditional paper-and-pencil test. The results to categorise the movement performed show a high degree of correspondence, and the following action category is the same in comparison to those of the new automatic approach for 3D reconstruction of awkward body postures in workplace. Further validation is necessary to achieve reliable statistical information. A major advantage of an automated survey is the increase in objectiveness. The spatial and temporal resolution can be clearly increased and now only depends on sensor’s properties. The reduction of processing efforts resp. human resources is a further benefit. Finally, the analysis conducted is suited as documentation in terms of the German Occupational Safety and Health Act (Section 6, subSection 1). A disadvantage is that the vision-based approach cannot give any statement about the weight of the load handled. 7. Conclusions The use of a range imaging camera to gather data on awkward body postures in workplaces and the targeted development of procedures for evaluating range camera image sequences make a contribution to current efforts in the field of the automated videographic documentation of bodily motion. The system developed offers people working in the field of ergonomic a helpful tool for an objective and efficient assessment of the working conditions as well as for the documentation of the results. At the same time, interesting novel application fields open up for photogrammetry. Future work can concentrate on a sub-segmentation of single body parts as additional dimensions for the potential energy surface model fit. The integration of the trajectories of anatomically relevant body points tracked by 2.5D LST could be promising, too. A full evaluation of the potential energy surface model fit is pending. The working posture assessment method OWAS chosen considers the weight of the load handled. Thoughts must be given on how to consider such attribute in an otherwise vision-based approach. There are further restrictions because the head movements captured and analysed cannot be handled by OWAS. So future work should also focus on the economically extension of the evaluation methods to be automated. Acknowledgements The research work presented in this paper has been funded by the German Federal Ministry of Economics and Technology (BMWi) via AiF ZIM Project KF2522101SS9.
References Ackermann, F., 1984. High precision digital image correlation. In: Proceedings of the 39th Photogrammetric Week, vol. 9. Schriftenreihe der Universität Stuttgart. pp. 231–243. Breidt, M., Bülthoff, H.H., Curio, C., 2010. Face models from noisy 3D cameras. In: ACM SIGGRAPH ASIA 2010 Sketches. SA ’10.. ACM, New York, NY, USA, pp. 12:1– 12:2. Brenscheidt, F., Lüther, S., Siefer, A., 2012. Arbeitswelt im Wandel: Zahlen – Daten – Fakten. Bundesanstalt für Arbeitsschutz und Arbeitsmedizin, first ed. Dortmund, D. ISBN: 978-3-88261-706-1. Breuer, P., Eckes, C., Muller, S., 2007. Hand gesture recognition with a novel IR timeof-flight range camera – a pilot study. In: Proceedings of the Mirage 2007 Computer Vision/Computer Graphics Collaboration Techniques and Applications, Rocquencourt, France, 28–30 March 2007. pp. 247–260. Diraco, G., Leone, A., Siciliano, P., 2010. An active vision system for fall detection and posture recognition in elderly healthcare. In: Proceedings of the Conference on Design, Automation and Test in Europe. DATE ’10. European Design and Automation Association, 3001 Leuven, Belgium, Belgium, pp. 1536–1541. Ellegast, R., 1998. Personengebundenes Messsystem zur automatisierten Erfassung von Wirbelsäulenbleastungen bei beruflichen Tätigkeiten. BGIA-Report 5. Hauptverband der gewerblichen Berufsgenossenschaften (HVBG), Sankt Augustin, D. ISBN: 3883835072. Ellegast, R., 2005. Fachgespräch Ergonomie 2004. BGIA-Report 4, Hauptverband der gewerblichen Berufsgenossenschaften (HVBG), Sankt Augustin, D. ISBN: 388383-687-7. EU, 1997. Guidance on Risk Assessment at Work. European Commission, Directorate-General V. ISBN 92-827-4278-4, Luxembourg. Förstner, W., 1986. A feature based correspondence algorithm for image matching. Archives of Photogrammetry and Remote Sensing XXVI, 150–166. Grün, A., 1985. Adaptive least squares correlation – a powerful image matching technique. South African Journal of Photogrammetry, Remote Sensing and Cartography 14, 175–187. Hückstädt, U.H., Herda, C., Ellegast, R., Hermanns, I., Hamburger, R., Ditchen, D., 2007. Muskel-Skelett-Erkrankungen der oberen Extremität und berufliche Tätigkeit. BGIA-Report 2, Hauptverband der gewerblichen Berufsgenossenschaften (HVBG). Sankt Augustin, D. ISBN: 9783883837222. Hempel, R., Westfeld, P., 2009. Statistical modeling of interpersonal distance with range imaging data. In: Esposito, A., Hussain, A., Marinaro, M., Martone, R. (Eds.), Multimodal Signals: Cognitive and Algorithmic Issues. COST Action 2102 and euCognition International School Vietri sul Mare, Italy, April 21–26, 2008 Revised Selected and Invited Papers, Lecture Notes in Computer Science, vol. 5398/2009. Springer, Berlin, Heidelberg, pp. 137–144. Holte, M.B., Moeslund, T.B., Fihl, P., 2010. View-invariant gesture recognition using 3D optical flow and harmonic motion context. Computer Vision and Image Understanding 114, 1353–1361. Jensen, R.R., Paulsen, R.R., Larsen, R., 2009. Analysis of gait using a treadmill and a time-of-flight camera. In: Proceedings of the DAGM 2009 Workshop on Dynamic 3D Imaging. Dyn3D ’09. Springer-Verlag, Berlin, Heidelberg, pp. 154–166. Kamusella, C., Schmauder, M., 2009. Ergotyping im rechnerunterstützten Entwicklungs- und Gestaltungsprozess. Zeitschrift für Arbeitswissenschaft 3, 212–222. Karhu, O., Kansi, P., Kuorinka, I., 1977. Correcting working postures in industry: a practical method for analysis. Appl. Ergonom. 8 (8), 199–201. Kilbom, Å., 1994. Repetitive work of the upper extremity: Part I – guidelines for the practitioner. Int. J. Ind. Ergonom. 14, 51–57. Kivi, P., Mattila, M., 1991. Analysis and improvement of work postures in the building industry: application of the computerised OWAS method. Appl. Ergonom. 22, 43–48. Lahamy, H., Lichti, D.D., 2012. Towards real-time and rotation-invariant american sign language alphabet recognition using a range camera. Sensors 12 (11), 14416–14441. Maas, H.-G., 1997. Concepts of real-time photogrammetry. Hum. Movement Sci. 16 (2–3), 189–199. McAtamney, L., Corlett, E.N., 1993. RULA: a survey method for the investigations of work-related upper limb disorders. Appl. Ergonom. 24 (2), 91–99. Meers, S., Ward, K., 2008. Head-pose tracking with a time-of-flight camera. In: Proceedings of the Australian Conference on Robotics and Automation, December 2008. Canberra, AUS, pp. 113–116. Motion Analysis Corporation, 2012. The Industry Leader for 3D Passive Optical Motion Capture.
. PMDTec, 2010. Datasheet PMD [vision] CamCube 3.0. PMDTechnologies GmbH, Siegen.
. Prechtel, N., Bringmann, O., 1998. Near-real-time road extraction from satellite images using vector reference data. In: Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Cambridge, UK, 13–17 July, pp. 229–234. Ördögh, L., Gehér, K., Kamusella, C., Szeredy, C., 2011. Virtual human engineering plugin. In: International Summit on Human Simulation. St. Pete Beach, Florida, US, 26–27 May. Rosenfeld, A., Pfaltz, J.L., 1968. Distance functions on digital pictures. Pattern Recogn. 1 (1), 33–61.
P. Westfeld et al. / ISPRS Journal of Photogrammetry and Remote Sensing 85 (2013) 56–65 Schmitter, D., 2005. Ermitteln der körperlichen Belastung bei Tätigkeiten im Sitzen (1. Ed.). Mitteilung 88212.d, Schweizerische Unfallversicherungsanstalt (SUVA), Luzern, CH. Schwarte, R., 1996. Eine neuartige 3D-Kamera auf der Basis eines 2DGegentaktkorrelator-Arrays. In: Aktuelle Entwicklungen und industrieller Einsatz der Bildverarbeitung. MIT GmbH, Aachen, pp. 111–117. Schwarte, R., Heinol, H., Buxbaum, B., Ringbeck, T., Xu, Z., Hartmann, K., 1999. Principles of three-dimensional imaging techniques. In: Jähne, B., Haußecker, H., Geißler, P. (Eds.), Handbook of Computer Vision and Applications – Sensors and Imaging, vol. 1. Academic Press, pp. 463–484 (Chapter 18). SIMI Reality Motion Systems GmbH, 2012. 2D/3D Bewegungsanalyse und Verhaltensanalyse. Unterschleißheim, D.
. Spirig, T., Seitz, P., Vietze, O., Heitger, F., 1995. The lock-in CCD-two-dimensional synchronous detection of light. IEEE J. Quant. Electron. 31 (9), 1705–1708. Westfeld, P., 2007. Development of approaches for 3-D human motion behaviour analysis based on range imaging data. In: Grün, A., Kahmen, H. (Eds.), Optical 3-
65
D Measurement Techniques VIII, vol. II. Institute of Geodesy and Photogrammetry, ETH Zurich, Zurich, pp. 393–402. Westfeld, P., 2012. Geometrische und stochastische Modelle zur Verarbeitung von 3D-Kameradaten am Beispiel menschlicher Bewegungsanalysen. Dissertation, Technische Universität Dresden, Fakultät Forst-, Geound Hydrowissenschaften, Professur für Photogrammetrie. Westfeld, P., Hempel, R., 2008. Range image sequence analysis by 2.5-D least squares tracking with variance component estimation and robust variance covariance matrix estimation. In: Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Beijing, China, 14–16 September, pp. 457–462. Westfeld, P., Maas, H.-G., in press. Integrated 3D range camera self-calibration. Photogrammetrie, Fernerkundung, Geoinformation (PFG).