Analysis of Human Pointing Behavior in Vision-based Pointing Interface System - difference of two typical pointing styles -

Analysis of Human Pointing Behavior in Vision-based Pointing Interface System - difference of two typical pointing styles -

13th IFAC/IFIP/IFORS/IEA Symposium on 13th Symposium on Analysis, Design, and Evaluation of Human-Machine Systems 13th IFAC/IFIP/IFORS/IEA IFAC/IFIP/I...

784KB Sizes 0 Downloads 40 Views

13th IFAC/IFIP/IFORS/IEA Symposium on 13th Symposium on Analysis, Design, and Evaluation of Human-Machine Systems 13th IFAC/IFIP/IFORS/IEA IFAC/IFIP/IFORS/IEA Symposium on 13th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Available onlineSystems at www.sciencedirect.com Aug. 30 Sept. 2, 2016. Kyoto, Japan Analysis, Design, and Evaluation of Human-Machine Systems Analysis, Design, and Evaluation of Human-Machine Systems Aug. Aug. 30 30 --- Sept. Sept. 2, 2, 2016. 2016. Kyoto, Kyoto, Japan Japan Aug. 30 Sept. 2, 2016. Kyoto, Japan

ScienceDirect

IFAC-PapersOnLine 49-19 (2016) 367–372

Analysis of Human Pointing Behavior Analysis Analysis of of Human Human Pointing Pointing Behavior Behavior in Vision-based Pointing Interface System in Vision-based Pointing Interface in Vision-based Pointing Interface System System -- difference of two typical pointing styles -- difference difference of of two two typical typical pointing pointing styles styles Kazuaki Kondo ∗∗ Genki Mizuno ∗∗ Yuichi Nakamura ∗∗ Kazuaki Kondo ∗∗ Genki Mizuno ∗∗ Yuichi Nakamura ∗∗ Kazuaki Kazuaki Kondo Kondo Genki Genki Mizuno Mizuno Yuichi Yuichi Nakamura Nakamura ∗ ∗ Academic Center for Computing and Media Studies, ∗ for ∗ Academic Academic Center CenterKyoto for Computing Computing and Media Media Studies, Studies, University,and Academic Center for Computing and Media Studies, Kyoto University, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan Yoshida-Honmachi, Sakyo-ku, Kyoto (email: kondo, mizuno, [email protected]) Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, 606-8501, Japan Japan (email: kondo, mizuno, [email protected]) (email: kondo, mizuno, [email protected]) (email: kondo, mizuno, [email protected])

Abstract: This paper reports human pointing behaviors in vision-based pointing interface Abstract: This paper reports human pointing behaviors in vision-based pointing interface Abstract: This reports pointing behaviors in vision-based pointing interface system, to make mathematical model of them for designing interface. natural Abstract: This aapaper paper reports human human pointing behaviors in easy-to-use vision-based pointingIn interface system, to make mathematical model of them for designing easy-to-use interface. In natural system, to tosituations, make a a mathematical mathematical modelatof ofdistant them for for designing easy-to-use interface.forIn Inexample natural pointing we point targets position witheasy-to-use various postures, system, make model them designing interface. natural pointing situations, we point targets at distant position with various postures, for example pointing situations, we point targets at distant position with various postures, for example straight arm style orwe bent elbow style.atWedistant analyze their difference in pointing behaviors with pointing situations, point targets position with various postures, for example straight arm or bent elbow style. We their difference in pointing with straight arm style or bent elbow style. We analyze their difference in pointing behaviors with assuming thestyle pointing interface system as analyze a feedback control model includingbehaviors an indicator. straight arm style or bent elbow style. We analyze their difference in pointing behaviors with assuming the pointing interface system as a feedback control model including an indicator. assuming the pointing interface system as a feedback control model including an indicator. The difference had been confirmed in the step responses and the estimated parameters in the assuming the pointing interface system as a feedback control model including an indicator. The had been confirmed in the responses the estimated parameters in The difference difference hadand been confirmed inactual the step step responsesinand and thepointing estimated parameters in the the transfer functions, matches to our experiences those styles. The estimation The difference had been confirmed in the step responses and the estimated parameters in the transfer functions, and matches to our actual experiences those pointing styles. transfer functions, and matches to our actual experiences in those pointing styles. The estimation accuracy of indicated position from indicator’s posture inin the intermediate stylesThe hasestimation been also transfer functions, and matches to our actual experiences in those pointing styles. The estimation accuracy of indicated position indicator’s posture in the intermediate styles has been also accuracy of indicated position from indicator’s posture in styles has been analyzed. The results said that from the reference point of indication smoothly moves from indicator’s accuracy of indicated position from indicator’s posture in the the intermediate intermediate styles hasindicator’s been also also analyzed. The results said that the reference point of indication smoothly moves from analyzed. The results said that the theto reference point of indication indication smoothly moves moves from from indicator’s indicator’s eye to his The or her elbow according the elbow joint angle. analyzed. results said that reference point of smoothly eye eye to to his his or or her her elbow elbow according according to to the the elbow elbow joint joint angle. angle. eye to his or her elbow according to elbow joint angle. © 2016, IFAC (International Federation ofthe Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Keywords: Pointing systems, Human-machine interface, Control theory, Closed-loop systems Keywords: Pointing Pointing systems, Human-machine Human-machine interface, Control Control theory, Closed-loop Closed-loop systems Keywords: Keywords: Pointing systems, systems, Human-machine interface, interface, Control theory, theory, Closed-loop systems systems 1. INTRODUCTION model assumed an indicating posture with an arm being 1. INTRODUCTION INTRODUCTION model assumed assumed an not indicating posture withanan an arm arm being being 1. model an indicating with straight, and did assumeposture that with joint 1. INTRODUCTION model assumed an not indicating posture withananelbow arm being straight, and did assume that with elbow joint straight, and did not assume that with an elbow joint being bent that also appears in natural pointing situations. straight, and did not assume that with an elbow joint Nowadays we can get wide visual display devices inexpen- being bent that also appears in natural pointing situations. Nowadays we can get wide visual display devices inexpenbeing bent that also appears in natural pointing situations. The purpose of this paper is to analyze human pointing being bent that also appears in natural pointing situations. Nowadays we can get wide visual display devices inexpensively, which easyvisual to show various contents on The purpose of this paper is to analyze human pointing Nowadays we makes can getus display devices inexpensively,regions. which makes uswide easy to to show show various contents on The purpose of is analyze human pointing behavior in the two paper pointing to expand proThe purpose of this this is to tostyles analyze human the pointing sively, which makes us easy various contents on large This accelerates distance between display sively, which makes us easy to show various contents on behavior behavior in the the two paper pointing styles to expand expand the prolarge regions. This accelerates distance between display in two pointing styles to the proposed pointing interface model to general one that can in the interface two pointing styles to expand the prolarge regions. accelerates distance between display and users. In This such environments, easy and intuitive re- behavior large regions. This accelerates distance between display posed pointing model to general one that can and users. In such environments, easy and intuitive reposed pointing interface model to general one that can deal with arbitrary pointing types. We analyze how the posed pointing interface model to general one that and users. In environments, intuitive re- deal with arbitrary pointing types. We analyze how can mote interaction scheme is required easy ratherand than oscillatory and users. In such such environments, easy and intuitive rethe mote interaction scheme is required rather than oscillatory deal with arbitrary pointing types. We analyze how the pointing styles affect to pointing behaviors and how these deal with arbitrary pointing types. We analyze how the mote interaction scheme is required rather than oscillatory interface like touch panel. For example, in presentation mote interaction scheme is required rather than oscillatory pointing pointing styles affect affect to pointing pointing behaviors and how how these interface like touch panel. For example, in presentation styles to behaviors and these can be described in the interface system model through pointing styles affect to pointing behaviors and how these interface like touch panel. For example, in presentation scene with slides on a wide screen, it is not reasonable interface touch For example, can be be described described in the the interface system model model through scene withlike slides on panel. wide screen, itscreen is in notpresentation reasonable in interface system measurement of actual pointing. Additionally whatthrough kind of can be described in the interface system model through scene with slides on aa wide is not for audiences to come close screen, to the it forreasonable pointing can scene with slides on a wide screen, it is not reasonable measurement of actual pointing. Additionally what kind of for audiences to come close to the screen for pointing measurement of actual pointing. Additionally what kind of method to estimate indicated positions should we select, measurement of actual pointing. Additionally what of for audiences to the pointing particular portion on theclose slide,to Wefor usually use method to estimate indicated positions should we kind for audiences to come come close todirectory. the screen screen for pointing select, particular portion on the slide, directory. We usually use method to estimate indicated positions should we select, in the case of the intermediate pointing style, e. g. slightly method to estimate indicated positions should we select, particular portion on the slide, directory. We usually use pointing gesture such a case. The pointing interface particular portionin the slide, directory. We usually use in the theelbow case of of the intermediate intermediate pointing style, style, e. g. g. slightly pointingshown gesture inonsuch such case. The pointing interface case the pointing bent postures are also investigated. in theelbow case of the intermediate pointing style, e. e. g. slightly slightly pointing gesture in aa case. pointing interface system in Fig. 1 supports theThe remote pointing using in pointing gesture in such a case. The pointing interface bent postures are also investigated. system shown in Fig. Fig. 11 supports supports the remoteatpointing pointing using bent bent elbow elbow postures postures are are also also investigated. investigated. system shown in remote using on visual measurement. It shows the a pointer an estimated system shown in Fig. 1 supports the remote pointing using on visual visual measurement. measurement. Itdisplay shows aabased pointer at an estimated estimated on shows pointer an indication position on aIt on at capturing indion visual measurement. shows abased pointer at an estimated indication position on this aItdisplay display on capturing indiindication position on a based on capturing indicator’s posture. With system, an indicated position indication position on a display based on capturing indi2. RELATED WORKS cator’s posture. With this system, an indicated position 2. RELATED RELATED WORKS cator’s posture. With this system, an indicated position becomes clear to the indicator and audience, which en2. cator’s posture. With this system, an indicated position 2. RELATED WORKS WORKS becomes clear clear to communication. the indicator indicator and and audience, which enbecomes to the encourages smooth Weaudience, often usewhich a raiser becomes clear to the indicator and audience, which enThe real-time vision-based pointing interface becomes couragesfor smooth communication. We often use only raiser courages smooth use aaa raiser pointer similarcommunication. purpose, but itWe canoften be used for The The real-time real-time vision-based pointing interfaceand becomes courages smooth communication. We often use only raiser vision-based pointing interface becomes practical with the progress of computational visual pointer for similar purpose, but it can be used for The real-time vision-based pointing interfaceand becomes pointer for similar purpose, but it can be used only for indicating. The vision-based pointing interface controls practical with the progress of computational visual pointer for similar purpose, but it can be used only for practical with the progress of computational and visual sensing performance. However, we still have two considerindicating. The vision-based pointing interface controls practical with the progress of computational and visual indicating. The vision-based pointing interface controls displayed contents and materials according to user’s pos- sensing sensing performance. However, we still have two considerindicating. The vision-based pointing interface controls performance. However, we still have two considerable problems for natural pointing in daily environments. displayed contents and materials according to user’s possensing performance. However, we still have two considerdisplayed and according user’s posture, and contents will construct interactive scheme to beyond able problems problems for natural natural pointing in daily daily environments. displayed and materials materials according to user’s mere pos- able for pointing in One is the difficulty in accurate measurement of pointing ture, and and contents will construct interactive scheme beyond mere able problems for natural pointing in daily environments. environments. ture, will construct interactive mere indicating. Additionally when robotsscheme living beyond with human One is the difficulty in accurate measurement offor pointing ture, and will construct interactive scheme beyond mere One is the difficulty in accurate measurement of pointing pose. Significant accuracy is required, especially a tarindicating. Additionally when robots living with human One isSignificant the difficulty in accurate measurement offor pointing indicating. Additionally when robots with human and/or virtual agents connect to the living pointing interface, pose. accuracy is required, especially tarindicating. Additionally when robots living with human pose. Significant accuracy is required, especially for aaa target at a distance because even tiny errors on coordinates and/or virtual agents connect to the pointing interface, pose. Significant accuracy is required, especially for tarand/or virtual agents connect to the pointing interface, these can get information about indicators andinterface, behave get get at a distance because even tiny errors on coordinates and/or virtual agents connect to the pointing at a distance because even tiny errors on coordinates of body parts are amplified on a screen. We have several these can get information about indicators and behave get at a distance because even tiny errors on coordinates these can adaptively. of body body parts parts are amplified amplified on screen. based We have have several these can get get information information about about indicators indicators and and behave behave of are aaa screen. We marker-less motion capture on techniques on several visual adaptively. of body parts are amplified on screen. based We have several adaptively. marker-less motion capture techniques on visual adaptively. motion capture techniques based on visual sensing that do not interfere with an indicator’s behavKondo et al. (2015) tried to assume the vision-based point- marker-less marker-less motion capture techniques based on behavvisual sensing that do not interfere with an indicator’s Kondo et al. (2015) tried to assume the vision-based pointthat an behavKondo et to the vision-based pointior like Shotton et al.interfere (2011); with Yoshimoto and Nakamura ing environment astried a feedback loop under classisensing that do do not not interfere with an indicator’s indicator’s behavKondo et al. al. (2015) (2015) tried to assume assume themodel vision-based point- sensing ior like Shotton et al. (2011); Yoshimoto and Nakamura ing environment as a feedback loop model under classilike Shotton et Yoshimoto and ing environment feedback model classi(2015), the performance those methods does not cal control theory.as It aamathematically indicator’s ior like but Shotton et al. al. (2011); (2011);of and Nakamura Nakamura ing environment feedback loop loop describes model under under classi- ior (2015), but the performance performance ofYoshimoto those sensing methods does not cal control control theory.asIt Itand mathematically describes indicator’s (2015), but the of those methods not cal theory. mathematically describes indicator’s satisfy the requirement, because visual is does unstable pointing behaviors visual perception using transfer (2015), but the performance of those methods does not cal control theory. It mathematically describes indicator’s satisfy the requirement, because visual sensing is unstable pointing behaviors and visual perception using transfer satisfy the requirement, because visual sensing is unstable pointing behaviors and visual perception transfer to illumination change and occlusion. The latency arising functions to simulate them with various using configurations satisfy the requirement, because visual sensing is unstable pointing behaviors and visual perception using transfer to illumination illumination change and occlusion. The and latency arising functions to simulate simulate them with various various configurations change The latency arising functions to them with from the sampling timeand of occlusion. visual sensing processing for designing easy-to-use interface, and to configurations predict point- to to illumination change and occlusion. The and latency arising functions to simulate them with various configurations from the sampling time of visual sensing processing for designing easy-to-use interface, and to predict pointfrom the sampling time of visual sensing and processing for designing easy-to-use interface, and to predict pointtime for estimation also cannot be sensing ignored.and processing ing behaviors easy-to-use for adaptiveinterface, display. However the proposed from the sampling time of visual for designing and to predict pointfor estimation also also cannot be be ignored. ing behaviors behaviors for adaptive adaptive display. However However the proposed proposed time for ing time for estimation estimation also cannot cannot be ignored. ignored. ing behaviors for for adaptive display. display. However the the proposed time Copyright 2016 IFAC 373 Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © 2016, IFAC (International Federation of Automatic Control) Copyright 2016 IFAC 373 Peer review© of International Federation of Automatic Copyright © 2016 IFAC 373 Copyright ©under 2016 responsibility IFAC 373Control. 10.1016/j.ifacol.2016.10.593

2016 IFAC/IFIP/IFORS/IEA HMS 368 Aug. 30 - Sept. 2, 2016. Kyoto, Japan

Kazuaki Kondo et al. / IFAC-PapersOnLine 49-19 (2016) 367–372

Visual sensing Visualization



Indicator

Fig. 2. The indicated position model for two typical pointing styles. (Left) with straight arm (Right) with bent elbow

Audience

Fig. 1. An overview of a vision-based pointing interface. The vision sensor measures indicator’s posture and estimates indicated position to display a pointer at that location. The other is ambiguity in human pointing behavior. It depends on position of the indicating target, intention in indicator, and also individuality. Pointing pose relative to a screen indicates approximately where a person is pointing. Fukumoto et al. (1994) reported that a target position is on a line defined by a fingertip and a reference point inside of an indicator’s body. The reference point moves depending on the pointing pose. As shown in Fig. 2, it is placed at an eye position for a distant target, but at an elbow position for a relatively near target. In addition, a geometric environment perceived by a human may not match an actual one. Knowledge of the relationship between a pointing posture and the indicated position is complicated and influenced substantially by various conditions. This makes it difficult to estimate indicated position from pointing pose accurately. Characteristics of a transient pointing behavior during an indicator changes a pointing target have been analyzed for a long time. R.S.Woodworth (1899) proposed a pointing action model with a combination of feed-forward motions for rapid approach to a target position followed by feedback adjustments. Fitts (1954) reported that pointing duration increases with a larger moving distance or a smaller target, a notion known as “Fitts’s law”, which has been used in many studies because of its accurate approximation in various conditions. The above problems suggest a need for additional schemes that reduce the influence of the ambiguity of pointing behaviors and measurement error to construct an easy-touse pointing interface. Most of the conventional methods tackling this issue focus on how to visualize a pointer and contents on a screen. McGuffin and Balakrishnan (2005) proposed zooming the region around the pointer. This means that the target size and the distance from it feel larger in its neighborhood. To obtain a similar effect, controlling cursor size or cursor speed has also been proposed by Worden et al. (1997); Grossman and Balakrishnan (2005); Blanch et al. (2004). Retaining the pointer trajectory within the last short duration proposed by Baudisch et al. (2003) helps to recognize and predict the behaviors of the pointing interface intuitively. However, those methods assume a mouse interface and have not been evaluated with a remote vision-based pointing interface, as we assume here. Thus, we need a general framework that enables us to evaluate, compare, combine, and improve such conventional methods under various conditions. 374

3. MODEL OF POINTING INTERFACE SYSTEM In this paper, We assume a vision-based pointing system of the sort in Fig. 1. It proceeds as follows. (1) The indicator has the target he wants to indicate in his or her intention. We define its position as a reference position pt . (2) The computer estimates its location pe based on visual sensing through the camera. The pointer is displayed at the location pc conducted from pe with some filters for visualization. (3) The indicator recognizes the pointer location as pr and adjusts his pointing posture to move it close to the target. (4) Steps (2)-(4) continue until pt − pr becomes 0. 1

This procedure can be modeled as a feedback control loop, as shown in Fig. 3. The control model is constructed by Hg for the indicator’s body kinematics, Hp for his or her visual perception catching a pointer, Hs for a computer estimating indicated position, and Hv for visualization filter. In this model, the indicator works as a controller with Hg and adjusts a feedback gain with Hp , simultaneously. The visualization Hv includes the pointer’s shape, position, and so on. The two problems in a vision-based pointing interface, the pointing pose ambiguity and the error of pointing pose estimation, are in Hg and the noise ds , respectively. 3.1 Pointing interface part The estimation of indicated position Hs consists of visual sensing via cameras and a pose estimation algorithm based on the measurement. We assume that it correctly estimates indicated position with particular latency τs and the estimation error included in the noise term ds , to formulate Hs as Hs (s) = e−τs s .

(1)

On the other hand, the formulation of Hv is determined by the visualization methods of a pointer and contents. We assume to display a sufficiently small circle pointer at a smoothed position of the estimated pointing position during the latest short duration N . This visualization formulates Hv as Hv (s) =

N −1 1 ∑ −(nτs +τv )s e , N n=0

(2)

1 In this paper, we focus on only the indicator’s behavior with assumption of audience perception being same as him.

2016 IFAC/IFIP/IFORS/IEA HMS Aug. 30 - Sept. 2, 2016. Kyoto, Japan

Kazuaki Kondo et al. / IFAC-PapersOnLine 49-19 (2016) 367–372

369

Estimation noise ds

Reference position pt

Indicator’s pointing gesture Hg

-

Perceived position pr

Indicator’s visual perception Hp

Visual sensing Hs

Visualized position pc

+

Kinect sensor

Estimated position pe

Screen for display

Visualization Hv

Fig. 3. The control model of the vision-based pointing interface system. The modules Hg , Hp , Hs and Hv describe indicator’s body kinematics, his visual perception, estimating pointing position, and cursor visualization. ds means disturbance into the position estimation. where τv is the latency for displaying the cursor on a screen. Eq. (2) with N = 1 means a typical display without smoothing. Other pointer visualizations can also be implemented using similar formulations. Note that a large circle pointer or displaying trajectory of estimated positions can have multiple factors affecting human visual perception. This requires a multidimensional design for Hv and Hp . 3.2 Indicator part The characteristics of human pointing action can still be uncertain, especially from an analytic viewpoint. We configured a mathematical model of Hg based on the measurements of actual pointing behaviors. In the preliminary experiments, we analyzed indicated position trajectories without cursor visualization, i.e., no feedback information is provided to an indicator. Pointing behavior when a pointing target suddenly moves to a distant location can be considered to be a step response of Hg . We confirmed that the trajectories converge with some overshoots and then formulated Hg as a second-order lag element Kg , (3) Hg (s) = e−τg s 2 2 Tg s + 2ζTg s + 1 where τg , Tg and ζ mean dead time to begin a pointing action, a parameter determining pointing speed, and a damping coefficient of pointing fluctuation, respectively. Hp is also difficult to formulate explicitly, because pr is an internal value of indicator and cannot be measured well. Thus, we consider Hp to be a first-order lag element Kp (4) Hp (s) = Tp s + 1 to reflect a delay in human perception and to avoid overfitting in the identification phase. 4. SYSTEM IMPLEMENTATION FOR EXPERIMENTS The overview of our implemented pointing interface for the experiments is shown in Fig. 4. The pixel × pixel visual contents are projected on the m × m screen ( the white wall ) using the short focal length projector. Subjects indicate particular points on the screen at approximately 375

Magnetic field-based 3D sensor

Fig. 4. The experimental environment 2.5m distance from that. For natural pointing, indicating postures are measured by Kinect v2 sensor close to the screen at the left of the subjects. But in the following experiments, we use the magnetic field-based 3D position sensor attached on the subject’s body instead of the Kinect sensor to realize the assumption ds = 0 as explained in the section 3.1. While this implementation should not be allowed in practical use, it can be accepted for analysis of human pointing behaviors. The available measurement space of the magnetic field sensor is almost 1m cube. It is difficult to calibrate the geometric relation to the screen, directly. Thus we estimated it via the Kinect sensor coordinate system, e. g. the combination of the rigid transformation between the magnetic field sensor and the Kinect sensor, and that between the Kinect sensor and the screen. We assumed homography transformation between the screen and the contents in the computer. The indicated position is estimated as the crossing position of the screen and the indicating vector that connects the reference point in the body and the finger tip, based on the Fukumoto et al. (1994)’s report. This corresponds to the actual processing of Hs . When the indicator’s arm is straight ( named “pointing style A” in this paper ), the magnetic field sensors are attached to his or her temples and finger tip to acquire the indicating vector ( named “eye-reference estimation model” ). In the case of the elbow joint being bent ( named “pointing style B” ), the sensor on the elbow is used for the reference point ( named “elbow-reference estimation model”). The approximately 1cm circular pointer is displayed at the estimated position without temporal smoothing with N = 1 in Hv . This pointer can be assumed as a point for the indicator.

5. DIFFERENCE OF POINTING BEHAVIORS IN TWO POINTING STYLES We focus on step responses of the pointing interface system to compare pointing behavior between the two pointing styles. A step input signal and its response correspond to the pointing target suddenly popping up at a distant from the initial position and the transient trajectory until the indicator feels to finish indicating the target, respectively. The experimental procedures are below.

Kazuaki Kondo et al. / IFAC-PapersOnLine 49-19 (2016) 367–372

100

90

90

80

80

80

70

70

70

60 50 40 30

response(cm)

100

90

response(cm)

100

60 50 40 30

60 50 40 30

20

20

20

10

10

10

0

0

0

-10

-10

-10

0

500

1000

1500

2000

2500

time(ms)

3000

3500

4000

0

500

1000

1500

2000

2500

time(ms)

3000

3500

4000

0

100

100

90

90

90

80

80

80

70

70

70

60 50 40 30

response(cm)

100

response(cm)

response(cm)

response(cm)

2016 IFAC/IFIP/IFORS/IEA HMS 370 Aug. 30 - Sept. 2, 2016. Kyoto, Japan

60 50 40 30

20 10

0

0 1500

2000

2500

time(ms)

Subject 1

3000

3500

4000

-10

500

1000

1500

2500

3000

3500

4000

2000

2500

3000

3500

4000

30

10

1000

2000

time(ms)

40

10

500

1500

50

20

0

1000

60

20

-10

500

0 0

500

1000

1500

2000

2500

time(ms)

Subject 2

3000

3500

4000

-10

0

time(ms)

Subject 3

Fig. 5. The step response trajectories for the two pointing styles. The horizontal and vertical axes represent transit time from the step input and distance from the initial location, respectively. The black colored trajectories mean the actually measured samples. The red one are simulated trajectories with the optimized model parameters estimated from the measured samples. (Top-row) the pointing style A (Bottom-row) the pointing style B (1) The measurement begins when a subject indicates an initial target visualized on the screen and the pointer remains to that position. (2) The initial target suddenly disappears. Simultaneously, a new target pops up at a 70cm distance from the initial target. The subject changes his or her posture to move the pointer onto the new target. (3) The measurement stops when the subject calls the finish of the pointing action. The measurement results for three subjects are shown as black colored trajectories in Fig. 5. We can see larger overshoots and fluctuation in the results of the pointing style B than style A. This matches to our daily experiences. The possible reasons are that small motion of a finger tip is reflected to larger motion on the screen in the pointing style B compared to the style A because the distance from the indication reference point and the finger tip is larger. Furthermore it is difficult for indicators to correctly percipient the indication vector because they can not see the elbow joint well and just have to perceive its location with only body sensation. In the pointing style A, rising velocity has larger variance compared to the style B. It may come from sizes of body portions to be moved in pointing ; an indicator needs to move whole arm in the pointing style A while only lower arm and hand in the most cases of the style B. For quantitatively analyzing the such differences between two pointing styles, we estimated the model parameters from the measured samples shown in Fig. 5 based on Kondo et al. (2015)’s method. This is to search parameters that minimize the difference between simulation and measured values of the pointing trajectories. The unknown 376

parameters in the model are τg , Kg , Tg , ζ, Kp , TP . The remained latency parameters τs , τv included in Hs , Hv can be determined by their measurement. We applied a convergence constraint to decrease their degree-of-freedom. Considering natural pointing situations, a cursor position pe converges on a target position pt after sufficient time passes. This corresponds to a stationary error of pt − pe being zero for a step input. The evaluation function E is a residual of step responses on pe , formulated as Te N ∑ ∑ (pˆe (t) − pe (t))2 , (5) E(τg , Kg , Tg , ζ, TP ) = i=0 t=0

where Te , pe (t), pˆe (t) indicate the measurement duration, measured value of pointing position, and simulated value, respectively. Eq. (6) is minimized about τg , Kg , Tg , ζ, TP . The results of the model parameter estimation and the simulated step responses using them are shown in Table 1 and the red colored trajectories in Fig. 5. Unfortunately simulated trajectories do not describe general characteristics because the system model may not have enough DOG to deal with the various measured samples. Thus we roughly analyze the optimized parameters. Focus on the subject 1 and 2, first. Kg in the pointing style B is larger than that in the style A, while the dead time τg is smaller. This explains easiness of moving only the lower arm in the style B. ζ has relation to fluctuation of step response trajectory. The ζ values in the case of the style B is smaller than that of the style A. It corresponds to larger fluctuation and matches to the measured samples. Tp becomes a coefficient of a derivative term in the closedloop transfer function. Thus it means degree of dependency on prediction of the displayed pointer motion when an

Kazuaki Kondo et al. / IFAC-PapersOnLine 49-19 (2016) 367–372

Style B : with bent elbow Kg Tg ζ τg 1.059 153.3 0.6763 318.9 1.056 68.11 0.6232 410.8 1.780 22.36 7.918 306.0

Tp 531.3 256.7 2.344

200 100

Estimation Error (cm)

sub. 1 2 3

StyleA : with straight arm Tg ζ τg Tp 115.0 0.6423 381.8 6.491 ×10−3 79.23 0.7348 430.4 5.890 ×10−4 8.884 19.27 347.4 0.1317

0

Kg 1.010 1.010 1.400

0

15

30

45

60

75

90

105

75

90

105

75

90

105

Elbow angle (deg.)

indicator changes own pointing posture. An indicator can not see own elbow well in the case of the pointing style B. Thus he or she seems to strongly depend on prediction.

300

Estimation Error (cm)

0

We did not find clear results as noted above in the case of the subject 3. One of the reasons is that the subject 3 is not accustomed to the pointing interface system. The values of ζ and Tp being similar in the both pointing styles can explain that novice have not learned sensation of body dynamics and do not depend on own uncertain prediction, respectively.

Subject 1

200

sub. 1 2 3

300

Table 1. The estimated model parameters for two pointing styles.

371

100

2016 IFAC/IFIP/IFORS/IEA HMS Aug. 30 - Sept. 2, 2016. Kyoto, Japan

6. HOW TO ESTIMATE INDICATED POSITION FOR INTERMEDIATE POINTING STYLES

Look at the estimation errors drawn with the black circles that assume the eye-reference model. The errors monotonically and smoothly increase according to the elbow joint angle in the result of all subjects. The estimation error drawn with the gray squares that assume the elbowreference model also changes smoothly but not monotonically. Because the elbow is relatively close to the vector connecting the eye and the finger tip when the elbow angle is small, e. g. the two estimation model are similar. The smaller errors switch from the eye-reference to the elbow-reference at different angle for the 3 subjects ; approximately 55 degrees for the subject 1, 2, and 70 degree for the subject 3, but the similar error amounts. As noted in section 5, the subject 3 is novice to the pointing interface and does not perceive own body posture well, especially when he bends his elbow. Thus the estimation accuracy with the elbow-reference model for around 60 377

15

30

45

60

Elbow angle (deg.)

200 100

Estimation Error (cm)

300

Subject 2

0

In practical pointing situations, people do not use the either pointing style alternatively. The intermediate posture with an elbow joint being slightly bent often appears. Here we conduct fundamental investigation to make it clear the relationship between pointing posture and indicated position in such case. We focus on whether the estimation model of indicated position smoothly changes from the eye-reference to the elbow-reference, or switches them at the particular elbow angle. In the experiments, the positions of the temple, the elbow, and the finger tip are measured by the magnetic field sensors when the indication posture becomes stable for the static pointing target. For various elbow joint angles, estimation errors are calculated as Euclid distances between the estimated position and the displayed target position on the screen, and used for analyzing which estimation model is suitable. The error distribution corresponding to the elbow joint angle θ for 3 subjects are shown in Fig. 6.

0

0

15

30

45

60

Elbow angle (deg.)

Subject 3 Fig. 6. The estimation errors of the indicated position. The horizontal axis represents elbow joint angle θ shown in Fig. 2 constructed by a shoulder-elbow vector and a elbow-hand vector. 0 and 90 degrees correspond to the pointing style A and B, respectively. The black circles and gray squares correspond to the estimation errors using the eye-reference model and the elbowreference model. degree seems to be still large. From the above analysis, a general estimation model for indicated position in which the reference point smoothly moves from eye to elbow may be suitable than the existing two models. 7. CONCLUSION In this paper, we analyze human pointing behavior with two typical pointing styles in vision-based pointing interface. The difference of pointing behaviors between the two pointing styles can be confirmed as the step responses and the parameters in the transfer functions in the con-

2016 IFAC/IFIP/IFORS/IEA HMS 372 Aug. 30 - Sept. 2, 2016. Kyoto, Japan

Kazuaki Kondo et al. / IFAC-PapersOnLine 49-19 (2016) 367–372

trol model. The mathematical characteristics of each style considered from those results well match to our daily experiences on actual remote pointing situations. From those analysis we confirmed that the parameters in the control model reflect human pointing behaviors in two pointing styles. But we also confirm that the current control model do not well describe the trajectory variations. The methods for estimating the model parameters should be improved because the residual minimization in the actual domain does not always well represent the pointing trajectories as shown in the cases of the subject 3 in Fig. 5. ( over smoothed ). The minimization in the frequency domain may match better. The estimation error of the indicated position for intermediate pointing styles with the two reference model indicates that the suitable estimation model for arbitrary elbow angle smoothly connects the eye-reference model and the elbow-reference model. Future works are to expand the current control model and the indicated position model to more general models based on the results, and to design the visualization filter Hv to construct easy-to-use interface adapting various pointing styles. REFERENCES Baudisch, P., Cutrell, E., and Robertson, G. (2003). Highdensity cursor: A visualization technique that helps users keep track of fast-moving mouse cursors. In In Proc. of Interact2003, 236–243. Blanch, R., Guiard, Y., and Beaudouin-Lafon, M. (2004). Semantic pointing: improving target acquisition with control-display ratio adaptation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’04, 519–526. ACM, New York, NY, USA. Fitts, P.M. (1954). The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, 47(6), 381–391. Fukumoto, M., Suenaga, Y., and Mase, K. (1994). FingerPointer: Pointing interface by image processing. Computers & Graphics, 18(5), 633–642. Grossman, T. and Balakrishnan, R. (2005). The bubble cursor: enhancing target acquisition by dynamic resizing of the cursor’s activation area. In In Proc. of the SIGCHI conference on Human factors in computing systems, 281–290. ACM. Kondo, K., Nakamura, Y., Yasuzawa, K., Yoshimoto, H., and Koizumi, T. (2015). Human pointing modeling for improving visual pointing system design. In In Proc. of Int. Symp. on Socially and Technically Symbiotic Systems (STSS) 2015. McGuffin, M.J. and Balakrishnan, R. (2005). Fitts’ law and expanding targets: Experimental studies and designs for user interfaces. ACM Trans. Comput.-Hum. Interact., 12(4), 388–422. R.S.Woodworth (1899). The accuracy of voluntary movement. Phychological Review Monograph Sullplement, 3(13), 1–119. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In In Proc. of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’11, 1297–1304. IEEE Computer Society, Washington, DC, USA. 378

Worden, A., Walker, N., Bharat, K., and Hudson, S. (1997). Making computers easier for older adults to use: area cursors and sticky icons. In Proceedings of the ACM SIGCHI Conference on Human factors in computing systems, 266–271. ACM. Yoshimoto, M. and Nakamura, Y. (2015). Cooperative gesture recognition: Learning characteristics of classifiers and navigating user to ideal situation. In In Proc. of The 4th IEEE International Conference on Pattern Recognition Applications and Methods, 210–218.