Entertainment Computing 1 (2009) 75–84
Contents lists available at ScienceDirect
Entertainment Computing journal homepage: ees.elsevier.com/entcom
Hidden Markov Model based gesture recognition on low-cost, low-power Tangible User Interfaces Piero Zappi *, Bojan Milosevic, Elisabetta Farella, Luca Benini DEIS, University of Bologna, V.le Risorgimento 2, 40136 Bologna, Italy
a r t i c l e
i n f o
Article history: Received 10 June 2009 Revised 14 September 2009 Accepted 27 September 2009
Keywords: Hidden Markov Models Smart object Tangible interfaces Gesture recognition Fixed point Multiple users
a b s t r a c t The development of new human–computer interaction technologies that go beyond traditional mouse and keyboard is gaining momentum as smart interactive spaces and virtual reality are becoming part of our everyday life. Tangible User Interfaces (TUIs) introduce physical objects that people can manipulate to interact with smart spaces. Smart objects used as TUIs can further improve the user experiences by recognizing and coupling natural gesture to command issued to the computing system. Hidden Markov Models (HMM) are a typical approach to recognize gestures. In this paper, we show how the HMM forward algorithm can be adapted for its use on low-power, low-cost microcontrollers without floating point unit that can be embedded into several TUI. The proposed solution is validated on a set of gestures performed with the Smart Micrel Cube (SMCube), a TUI developed within the TANGerINE framework. Through the paper we evaluate the complexity of the algorithm and the performance of the recognition algorithm as a function of the number of bits used to represent data. Furthermore, we explore a multiuser scenario where up to four people share the same cube. Results show that the proposed solution performs comparably to the standard forward algorithm run on a PC with double-precision floating point calculations. Ó 2009 Published by Elsevier B.V.
1. Introduction Traditional user interfaces define a set of graphical elements (e.g., windows, icons, menus) that reside in a purely electronic or virtual form. Generic input devices like mouse and keyboard are used to manipulate these virtual interface elements. Although, these interaction devices are useful and even usable for certain types of applications, such as office duties, a broad class of scenarios foresees more immersive environments where the user interacts with the surroundings by manipulating the objects around him/her. Tangible User Interfaces (TUIs) introduce physical, tangible objects that augment the real physical world by coupling digital information to everyday objects. The system interprets these devices as part of the interaction language. TUIs become the representatives of the user navigating in the environment and enable the exploitation of digital information directly with his/her hands. People, manipulating those devices, inspired by their physical affordance, can have a more direct access to functions mapped to different objects.
* Corresponding author. Tel.: +39 051209 3835. E-mail addresses:
[email protected] (P. Zappi),
[email protected] bo.it (B. Milosevic),
[email protected] (E. Farella),
[email protected] (L. Benini). 1875-9521/$ - see front matter Ó 2009 Published by Elsevier B.V. doi:10.1016/j.entcom.2009.09.005
The effectiveness of a TUI can be enhanced if we use sensor augmented devices. Such smart objects may be able to recognize user gestures and improve human experience within interactive spaces. Furthermore, the opportunity to execute a gesture recognition algorithm on-board brings several advantages: 1. The stream of sensors readings is not sent over the wireless channel. This reduces the radio use and extends object battery life. 2. The reduced wireless channel usage allows the coexistence of a larger number of objects in the same area. 3. Each object operates independently and in parallel with the others, improving system scalability. 4. The handling of objects moving between different physical environments is facilitated. 5. No other systems, such as video cameras, are required to detect and classify user movements, thus system cost is reduced. The SMCube is a tangible interface developed as a building block of the TANGerINE framework, a tangible tabletop environment where users manipulate smart objects in order to perform actions on the contents of a digital media table [5]. The SMCube is a cube case with 6.5 cm edge. At current stage of development, it is equipped with sensors (digital tri-axes accelerometer as default) and actuators (infrared LEDs, vibromotors). Data from accelerometer is used to locally detect the active face (the one directed
76
P. Zappi et al. / Entertainment Computing 1 (2009) 75–84
upwards) and a set of gesture performed by the user (cube placed on the table, cube held, cube shaken and tap [8]). This information is wirelessly sent to the base station that controls the appearance and the elements of the virtual scenario projected on the digital media table for processing. Furthermore, through the LEDs the node can interact with a vision based system in a multi-modal activity detection scenario. The recognition algorithms developed in previous work rely on time invariant features extracted from the acceleration stream. In a more general case the information used to recognize a gesture is contained in the sequence of acceleration rather than in some particular features. For this class of problem a different family of algorithms has been developed. Among the other, Hidden Markov Models (HMMs) and their variants have been extensively used for gesture recognition. HMMs belong to the class of supervised classifiers, thus they require an initial training phase to tune their parameters prior to normal operation. Even if the training of a HMM is a complex task, classification is performed using a recursive algorithm called forward algorithm. Although this process is a lightweight task compared to training, several issues must be considered in order to implement it on a low-power, low-cost microcontroller such as the one embedded on the SMCube. In this paper, we present our fixed point implementation of the HMM forward algorithm. Through the paper we highlight the issues related to the implementation of this algorithm on low-computational power, low memory devices that cannot rely on floating point arithmetic and, starting form the analysis of the standard floating point implementation of the algorithm, we propose a solution to these issues. We evaluate how the use of fixed point variables with limited accuracy (8-16-32 bits) impacts system performance and if our on-board solution can replace the analysis of data streams performed through the standard algorithm executed on a PC. To characterize the performance of the algorithm we collected a dataset where four users perform a set of four gestures (drawing a circle, a rectangle or an ex, flipping a page) holding the SMCube. The selected gestures are not related to a particular application but can be associated to general meanings (i.e., the ex can represent the command delete, and the flip a change the background of an application). The recognition performance of the fixed point implementations using different data accuracies has been compared to the one of the standard implementation carried using double precision considered as the target performance. Two scenarios are considered: the cube is used by a single person and up to four people share the same object. Furthermore, we characterize the computational and memory cost of the algorithm to evaluate the scalability of our approach in terms of number of gestures that can be detected by a single cube. The rest of the paper is organized as follow. Next section presents an overview of the state of the art for TUIs. Section 3 introduces the architecture of the SMCube, while Section 4 focuses on the proposed activity recognition chain and includes an overview of HMMs, the forward algorithm and the data preprocessing step. This section highlights the critical steps for a fixed point implementation. Our fixed point solution is presented in Section 5. Section 6 presents the dataset used for test. Following we characterize the implementation (Section 7) and discuss the results. Section 8 concludes the paper.
2. Related work 2.1. Tangible User Interfaces (TUIs) Almost two decades ago, research began to look beyond current paradigms of human–computer interaction based on computers
with a display, a mouse and a keyboard, in the direction of more natural ways of interaction [44]. Since then concepts as wearable computing [22] and tangible interfaces [15] have been developed. The use of TUIs has been proposed in many scenarios where users manipulate virtual environments. This has been proved to be useful especially in applications for entertainment and education [14]. An analysis of the impact of TUI within a school scenario is presented in Ref. [30]. According to this work, research from psychology and education suggests that there can be real benefits for learning from tangible interfaces. An early study on different interaction technologies including TUIs has been presented in Ref. [33]. In this study the authors highlight how graspable interfaces push for collaborative work and multiple hand interaction. In Ref. [42] authors developed an educational puzzle game and compared two interfaces: a physical one based on TUIs and a screen-based one. Results show how TUIs are an easier mean to complete assigned task and have higher acceptance among the 25 children between 5 and 7 years old involved in the test. TUIs can enhance the exploration of virtual environments. Virtual heritage (VH) applications aim at making cultural wealth accessible to the worldwide public. Advanced VH applications exploit virtual reality (VR) technology to give to users immersive experiences, such as archaeological site navigation, time and space travel, and ancient artifact reconstruction in 3D. Navigation through such virtual environment can benefit from the presence of tangible artifacts like palmtop computers [11] or control objects [13]. In Ref. [31] the Tangible Moyangsung, a tangible environment designed for a group of users that can play fortification games, is presented. People, manipulating tangible blocks, can navigate in the virtual environment, solve puzzles or repair damaged virtual walls in an evocation of historical facts. Interactive surfaces are a natural choice when developing applications that deal with browsing and exploration of multimedia contents. On these surfaces users can manipulate elements through direct and spontaneous actions. For example, in Ref. [4] multiple users can collaborate within an interactive workspace featuring vision based gesture recognition to perform knowledge building activities such as brainstorming. On the reacTable [17] several musicians can share the control of the instrument by caressing, rotating and moving physical artifacts with dedicated functions on the table surface. The TViews is a LCD based framework where user can interact with displayed contents through a set of TUIs (puck) [24]. The puck is used to select and move virtual object and its position is tracked using acoustic and infrared technologies. Another example is the Microsoft Surface Computing platform [26] where multiple users can share and manipulate digital contents on a multi-touch surface. The expressiveness of TUIs can be enhanced by the use of smart objects. The MusicCube is a tangible interface used to play digital music like an mp3 players [7]. The cube is able to understand the face pointing upwards and a set of simple gestures. This ability, together with a set of controls and buttons, is used to choose the desired playlist and to control music volume. The display cube is a cube shaped TUI equipped with a three-axes accelerometer, 6 LCD displays (one per face) and a speaker, used as a learning appliance [23]. SmartPuck is a cylindrical shape multi-modal input–output device having an actuated wheel, a 4 way button, LEDs and speaker and it is used on a plasma display panel [20]. SmartPuck allows multiple user interaction with menus and application and has been tested by using it to navigate within Google Earth program in place of using a mouse and desktop. Some commercial The WiimoteTM is a controller developed by Nintendo for its WiiTM console [28]. This controller embeds an accelerometer, an infrared camera and a Bluetooth transceiver and is used to interact with a large number of applications and videogames.
77
P. Zappi et al. / Entertainment Computing 1 (2009) 75–84
2.2. Gesture recognition Gesture recognition algorithms typically are made up of four steps: data acquisition from the sensors, data preprocessing to reduce noise, extraction of relevant features from the data stream and classification. Several design choices are available at each step, depending on the application scenario, the activities that have to be recognized, and the available computational power. When features are time invariant (e.g., zero crossing rate or frequency spectrum), simple time-independent classifiers can be used (e.g., linear classifiers, such as Support Vector Machines, or decision trees, such as C4.5). In a more general case, features are time dependent and classifiers suited for temporal pattern recognition are used. Typical approaches include dynamic time warping [21], neural networks [3], conditional random fields [25] and Hidden Markov Models (HMMs) [45]. Even if several classification algorithms have been proposed for implementation on smart objects [35,34,16], the solutions proposed to recognize gestures performed with TUIs typically rely on vision systems [40,18,36], or on the collection and processing of data from an external PC [2,29,43], or on the recognition of simple gestures through the analysis of time invariant features [46,7,9]. An exception to the previously cited papers is the work proposed by Kyoko et al. [41]. In this work the authors present the m-ActiveCube, a physical cube equipped with sensors (ultrasonic, tactile and gyroscopes) and actuators (buzzer, LEDs and motors) that acts as bidirectional user interface toward a 3D virtual space. Multiple cubes can be connected and collaborate in achieving a defined task. In this paper the authors evaluate a fixed point implementation for HMM able to perform speech recognition. Since the proposed algorithm cannot be implemented on a single cube, the basic idea here is to balance the computation among several cubes. One of the main limits of this work is that the critical issue of data synchronization among different cube that participate to the computation is not considered. Furthermore the authors assumes that all the cubes always participate to the speech recognition, so every node of the network is a point of failure for the whole system. Finally the recognition ratio of the proposed algorithm is not evaluated, therefore it is not possible to evaluate the performance of this solution. In contrast to the work presented in Ref. [41], here we present an algorithm able to recognize complex gestures and that can be implemented on a single cube with much lower computational power and memory than the ones available on the m-ActiveCubes. As a consequence in our solution each cube is independent from the others, hence (1) it does not need any synchronization, (2) it is not a point of failure for the whole system, (3) multiple users can operate on the table top at the same time, (4) wireless communication need is reduced (only indications of gestures are sent) resulting in longer battery life and interference reduction. To the authors’s best knowledge no previous works have been carried on to fully embed such activity recognition algorithm on low-power, low-cost 8-bit microcontrollers.
Fig. 1. The TANGerINE SMCube. The cube edge is 6:5 cm long. On the top left the inner surface of the master face that includes the microcontrollers, the accelerometer and the transceiver. On the top right the inner surface of the other five faces of the cube.
through a 1000 mA=h, 4.2 V Li-ion battery. With this battery the cube reaches up to 10 h of autonomy during normal operation. The ATMega168 features a RISC architecture that can operate up to 20 MHz and offers 16 KB of Flash memory, 1 KB of RAM and 512 Bytes of EEPROM. The microcontroller includes a multiplier and several peripherals (ADC, timers, SPI and UART serial interfaces, etc.). In our prototype the CPU operates at 8 MHz. The firmware has been implemented in C using the Atmel AVR Studio 4 IDE that, used in conjunction with avr-libc and the gcc compiler, provides all the APIs necessary to exploit the peripherals and perform operations with 8, 16, and 32 bit variables. Being written in C, the code is portable on other devices. A wireless bootloader is used to load a new firmware on the cube without the need to disassemble it. Each cube is identified by an ID number stored on the cube flash memory, which helps disambiguating when more than one cube is present on the scene at a certain time. The LEDs pattern on every face of the cube is composed of 8 points (see Fig. 2). In the basic configuration only points p1, p2, p3, p5 are switched on, the remaining points are used as a binary encoding of the cube id (note that this visual id is not related to the cube id stored on the MCU flash). In addition to the LED actuation can be provided through six vibromotors mounted on cube faces. The choice of the SMCube for this work is motivated by the fact that it is representative of a wide set of smart objects since it em-
3. Smart Micrel Cube The Smart Micrel Cube (SMCube) is a cube shaped artifact with a matrix of infrared emitter LEDs on each face (see Fig. 1). It embeds a low-cost, low-power 8-bit microcontroller (Atmel ATmega168 [1]), a Bluetooth transceiver (Bluegiga WT12 Bluetooth module [6]) that supports Serial Port Profile (SPP) and a MEMS tri-axial accelerometer (STM LIS3LV02DQ [39]) with a programmable full scale of 2g or 6g and digital output. The cube is powered
Fig. 2. SMCube LEDs patterns.
78
P. Zappi et al. / Entertainment Computing 1 (2009) 75–84
beds low-power, low-cost hardware that can be integrated in a large number of devices. Therefore, the algorithm presented in this paper can be used in a wide range of TUIs. 4. Activity recognition chain Typical activity recognition systems are made up of four steps: (1) data preprocessing, (2) segmentation, (3) features extraction, (4) classification. At each step several design choices must be taken. In this work, we do not address the problem of segmentation and we deal with isolated gestures. Yet, in the literature other works that propose solutions to this task can be found. For example, different approaches for time series segmentation are presented in Ref. [19], an optimized approach for low-power wearable systems can be found in Ref. [38] and a HMM based algorithm for hand gestures segmentation is proposed in Ref. [12]. In the following sections, we provide details on the classification, data preprocessing and feature extraction steps. 4.1. Hidden Markov Models HMMs are often used in activity recognition since they tend to perform well with a wide range of sensor modalities [37] (they are also used successfully in other problem domains, such as speech recognition, for which they were initially developed [27]). Several variants of HMM have been proposed to address traditional algorithm lacking. For example, specific state duration or null state transitions can be defined to better model speech signals [32], coupled HMM (CHMM) have been designed to better characterize multiple interdependent sequences[47], Factorial HMM (FHMM) and parallel HMM (PHMM) have been developed to combine in an efficient way multiple features but at different level of abstraction (FHMM at feature level, PHMM at decision level) [10]. Therefore, the selection of the best model depends on the application and the set of gestures that we want to recognize. Since in this work we want to evaluate the consequences, in terms of performance loss, of implementing an algorithm from of the HMM family on low-power hardware without floating point unit, we do not analyze specific variants and we investigate the basic, standard, ergodic (fully connected) version of HMM. A Hidden Markov Model (HMM) is a statistical model that can be used to describe a physical phenomenon through a set of stochastic processes. It is composed by a finite set of N states (qt ¼ si where 1 < i < N). At every time step the state of the system changes. Transitions among the states are governed by a set of transition probabilities, A ¼ aij ¼ Pðqtþ1 ¼ sj jqt ¼ si Þ (the probability that the system is in the state i at time t and in the state j at time t þ 1). At every time step an outcome ðot Þ is generated, according to the associated observation probabilities, B ¼ bi ¼ Pðojq ¼ si Þ (probability of observing symbol o while the system is in state i). An ergodic HMM has aij – 0 8i; j. Outcomes can belong either to a continuous domain (thus the bi are probability density function) or a discrete domain (therefore we have M symbols bi ðkÞ with 1 < k < M). In the former case we deal with continuous HMM, in the latter with discrete HMM. Only the outcome, not the state, is visible to an external observer and therefore states are ‘‘hidden” to the outside. Furthermore a model is defined by the starting probabilities P ¼ pi ¼ Pðq1 ¼ si Þ. The compact notation for an HMM is k ¼ ðA; B; PÞ. Training of a HMM is carried on using an iterative algorithm called the Baum–Welch, or Expectation–Maximization, algorithm and a set of reference instances. Once a set of models have been trained (one for each class that we want to recognize), classification of new instances is performed using the forward algorithm.
Following, in order to clarify the notation used, we report only the forward algorithm. 4.1.1. The forward algorithm The forward algorithm is a recursive algorithm that relies on a set of support variables at ðiÞ ¼ Pðo1 ; o2 ; . . . ; ot ; qt ¼ si jkÞ and allows to find the probability that a certain model generated an input sequence PðOjkÞ. It is made up of three steps. 1. Initialization: a1 ðiÞ ¼ pi ðo1 Þbi ðo1 Þ, 1 6 i 6 N. P atþ1 ðjÞ ¼ ½ Ni¼1 at ðiÞaij bj ðotþ1 Þ, 2. Induction: 1 6 t 6 T 1. P 3. Termination: PðOjkÞ ¼ Ni¼1 aT ðiÞ.
16j6N
and
The at ðjÞ are sum of a large number of terms in the form t1 Y
asðrÞ;sðrþ1Þ
r¼1
t Y
! bsðrÞ ðor Þ
ð1Þ
r¼1
Since both the aij and the bi ðkÞ are smaller than 1 as t become large at ðjÞ tends to zero exponentially and soon it exceeds the precision of any machine. In order to avoid underflow, the at ðjÞ are normalized at every b t ðjÞ ¼ ac t are step using the scaling factor ct ¼ PN 1 . The scaled a t at ðiÞ i¼1 used in place of the at ðjÞ. b t ðiÞ in When using normalization we cannot simply sum the a the termination step, since their sum is equal to 1. However, we can notice the following [32]: N X i T Y
c at ðiÞ ¼
T Y
ct
t¼1
N X
aT ðiÞ ¼ C T
i
N X
aT ðiÞ ¼ 1
ð2Þ
i
ct PðOjkÞ ¼ 1
ð3Þ
t¼1
1 PðOjkÞ ¼ QT
ð4Þ
t¼1 ct
log½PðOjkÞ ¼
T X
log½ct
ð5Þ
t¼1
The use of the logarithm in the last step is necessary in order to avoid underflow since the ct are smaller than 1 and their product tend to zero exponentially. Note that any constraint is placed on the length of the sequence to classify. 4.2. Data preprocessing and feature extraction The input sequence of accelerations terns should be preprocessed to highlight important signal properties while reducing input dimensionality. In particular, we are interested in using discrete features since discrete HMM are much less computationally demanding than HMMs operating on continuous observations. Furthermore, we must note that when we perform a gesture using the cube its initial orientation strongly affects the sequences associated to the gestures. For example, if we consider the trivial gesture where a user lifts the cube and put it down, according to which face is directed upwards we can have six different sequences out of the accelerometer (two for each axis). Moreover, the design space further increase when considering more complex gestures, such as the ones presented in Section 6. For these reasons, we decided to use only the module of the acceleration calculated as the sum of the squares of the accelerations along the three axes. The final square root is not computed. This results in reduced computational cost, since the number of streams to process diminish from three (one for each axis) to
79
P. Zappi et al. / Entertainment Computing 1 (2009) 75–84
one, and in initial orientation independent sequences, i.e., they do not depend on where the user holds the cube. In order to use a discrete model, we rely on discrete feature symbols that indicate the acceleration direction (e.g., negative acceleration, positive acceleration, and no acceleration). The conversion of magnitude samples at into a feature symbol ft is done by means of one threshold DR as follows:
1. the product between an N N matrix (the transition probabilities matrix A) and the old N 1 vector of the at (N 2 multiplication and ðN 1Þ sums); 2. execute an element by element product of the resulting vector with the column of the observing probabilities matrix (B) associated to the output otþ1 (N multiplication and ðN 1Þ sums).
8 > < for at < R DR fti ¼ 0 for R DR 6 at 6 R þ DR > : þ for at > R þ DR
The scaling algorithm first finds the highest at ðiÞ, than computes the number of shift needed and finally shifts the other at ðiÞ. To execute this procedure, in the worse case, we need to make N 1 variable comparisons to find the highest one and datasize 1 comparisons to find the number of shifts. Finally, we perform N ðdatasize 1Þ shifts. In Table 1, we present the computational cost of the basic operations used to evaluate the complexity of our implementation. A summary of the complexity of the steps outlined above is presented in Table 2 (where C is the number of gestures we want to recognize). The memory cost is given by data8size C ðN 2 þ N M þ NÞ. The models can be stored either in the MCU RAM, Flash or EEPROM.
ð6Þ
where R is the acceleration magnitude when no movements are performed ð1gÞ.
5. Fixed point solution The low-power microcontroller embedded on the SMCube includes a multipliers but not a divider. Therefore, it can efficiently compute the steps required for the forward algorithm for discrete HMM, but not the one of the standard normalization procedure. In fact, as shown in the previous section, this algorithm requires to perform N divisions each time a new sample is processed. To find a solution suitable for our MCU we must notice that the b t ðjÞ objective of the normalization procedure is simply to keep the a within the range of the machine. Thus, it is not necessary that PN b i a t ðiÞ ¼ 1. We propose an alternative scaling procedure: 1. at each time step t, once computed the at ðiÞ, check if the highest at ðiÞ is smaller than 12, otherwise scaling is not needed at this step; 2. calculate the number of shift to the left lt needed to render the highest at ðiÞ greater than 12; 3. shift all at ðiÞ to the left of lt bits. If, at a certain time t, all the at ðiÞ are equal to zero they are all placed at 12 and lt is equal to the number of bit with whom we represent our data (data size). This procedure requires only shifts and can be efficiently implemented on the low-power microcontroller embedded into the SMCube. Another problem arises when we need to compute the logarithm of the ct (see Eq. 5). However the proposed scaling procedure eases this task. In fact in this case the final probability is given by P P cT ðiÞ–1. By using log PðOjkÞ ¼ logðrÞ Tt¼1 log 2lt , where r ¼ Ni a PT log2 we already have the value of t¼1 log 2lt by keeping track of how many shifts we performed for scaling. Furthermore, we do not need to compute logðrÞ since logarithm is a monotonically increasing function. Thus, to compare 2 models, we simply check for the one that required less shifts for scaling, in case of tie the one with higher r is the model with higher PðOjkÞ.
5.1. Forward algorithm complexity Classification of a new instance using HMM is performed by computing, through the forward algorithm, the probability that an input sequence is generated by each model associated to a gesture. The instance is classified as belonging to the class of the model that results in highest probability. Therefore, once we detected the beginning of a new gesture, each time the MCU samples a new data from the accelerometer it must preprocess the input data, execute one step of the forward algorithm with all models and normalize the at ðiÞ of all the models. According to the standard algorithm presented above, one step of forward algorithm (i.e., calculate atþ1 ðiÞ, 1 < i < N) requires
6. Experimental setup The algorithm is validated against a set of four gestures performed by four people working in our laboratory. Each tester performed 60 repetitions of each gesture for a total of 1200 instances. Even if all the performers are people in the field of computing engineering, they have been asked to perform the gestures without any particular training except a single initial visual demonstration. The four gestures selected for our tests are shown in Table 3. To the purpose of evaluating the use of our fixed point implementation with respect to the standard one the set of chosen gesture is not crucial. Therefore, we selected a set of movements that can be representative of the ones used to interact with computing systems. For example the gestures Square and Circle could represent actions like cut or copy on a set of selected items. On the other hand the ‘X’ could mean to delete a set of items and the Flip to change the context of an application (e.g., the background). An analysis of the computational and memory cost as a function of the number of gestures that have to be detected is presented in the following sections. During our test the accelerometer on the SMCube has been sampled at 100 Hz. Raw data have been sent via Bluetooth to a base PC. This enables to use this data set as a reference data set,
Table 1 Basic operations computational complexity. Operation
Cost
Shift Variable compare Sum 8 bits Sum 16 bits Sum 32 bits Multiplication 8 bits Multiplication 16 bits Multiplication 32 bits
1 1 1 4 8 4 15 35
Table 2 Algorithm complexity. Algorithm
Cost
atþ1 Calculation
ðN 2 þ NÞ mul þ 2 ðN 2 1Þ sum ðN 1Þ þ ðN þ 1Þ ðdata size 1Þ
Normalization Single step (8 bit)
C ½6N2 þ 12N þ 4
Single step (16 bit)
C ½23N2 þ 31N þ 7
Single step (32 bit)
C ½51N2 þ 67N þ 14
80
P. Zappi et al. / Entertainment Computing 1 (2009) 75–84
and to later assess the effect of various types of data representation and processing through simulations Manual data segmentation and labeling was done by a test supervisor through a simple acquisition application running on the PC. This allows to obtain reliable ground truth, and to separate the problem of gesture classification from that of data segmentation.
7. Tests and simulations Our objective is to understand how our implementation, that uses fixed point data and the proposed scaling technique, performs with respect to a reference implementation that follows the standard algorithm and uses double precision for data representation. Therefore, we used the collected dataset to train a set of HMMs for each tester using the floating point notation with double precision. Each model has been trained using 15 reference instances, 15 loops for the Baum–Welch training algorithm, and 10 initial random models. We used twofold cross-validation to use the whole available instances for validation. Thus, the instances have been divided into two groups. Two models have been trained, each one using a different group of instances. The models have been validated on the group of instances not used for training. As a consequence we can draw our results on the whole dataset. From now on we refer to these models as floating point models and we use them in all tests to provide reference results.
The same models have been converted into fixed point notation using different accuracies (8, 16, and 32 bits) and the accuracy of these models has been compared to the one of the floating point models. Performance is evaluated using the following indexes: of correctly classified instances ; is a Correct classification ratio: CCR ¼ numbertotal number of instances global indication of the performance of the classifier. of instances correctly classified for class i ; is an indica Precision: PRi ¼ number number of instances classified as class i tion of the exactness of the classifier. of instances correctly classified for class i ; is an indication Recall: RCi ¼ number total number of instances from class i of the performance of the classifier over a specific class.
7.1. Parameter selection Before evaluating our implementation, we selected the threshold for ternarization ðDRÞ and the number of states for the HMMs ðNÞ. In order to select these parameters, for each user, we built several models sweeping the threshold value from DR ¼ 100 mg to DR ¼ 700 mg with 50 mg step and the number of HMM states from 3 to 10. We evaluated the performance of each model in classifying the instances from the same user. The couple threshold-number of states that resulted in the best average CCR among the four testers has been chosen. Table 4 shows the results of our simulations. According to these results we adopted DR ¼ 250 mg and N ¼ 10.
Table 3 List of activity to recognize.
Gesture circle. The tester picks up the cube from the table, draws a single, clock wise, circle on the vertical plane in front of him than puts the cube down. No constraints have been posed on the circle size
Gesture square. The tester picks up the cube from the table, draws a single, clock wise, square on the vertical plane in front of him than puts the cube down. No constraints have been posed on the square size or the side length
Gesture X. The tester picks up the cube from the table, draws an X on the vertical plane in front of him than puts the cube down. The X is executed starting from the line beginning on the upper-right corner to the lower-left corner
Gesture flip. The tester picks up the cube from the table, simulate the flip of a book page ideally placed in front of him than puts the cube down
81
P. Zappi et al. / Entertainment Computing 1 (2009) 75–84
7.2. Data preprocessing cost
Table 7 Memory cost.
The preprocessing of data requires the calculation of the magnitude and the ternarization of the resulting data. Since the square root is a monotone function we can skip this step in the calculation of the magnitude and adapt the threshold DR to the square values. Therefore, the whole preprocessing requires only three multiplications and two sums for the magnitude and two compares for the ternarization. According to the metrics proposed in Table 1, the computational cost of this step is presented in Table 5. In the same table, we present also the number of CPU cycles and the execution time needed to preprocess the input tern of accelerations.
7.3. Forward algorithm computational and memory cost We evaluated the time needed for the SMCube microcontroller to execute one step of the forward algorithm. Table 6 presents the number of CPU cycles and the time needed to perform such step at 8 MHz. The results presented in Table 6 refer to a single HMM. Here we are classifying four gestures, as a consequence when sampling the data at 100 Hz with the MCU running at 8 MHz we cannot implement the 16 and 32 bit version of the algorithm in real time. This is not a problem because the ATMega168 clock can be increased up to 20 MHz. At this speed the 16 bit version can be executed real time (see Table 6). Furthermore, we can reduce the sampling rate. With a sampling rate of 20 Hz we are able to implement the 32 bit version. The memory cost for the three implementations is presented in Table 7. Table 4 Average CCR for different combination of thresholds and number of states. In bold the best model. Th. (mg)
100 150 200 250 300 350 400 450 500 550 600 650 700
Number of states 3
4
5
6
7
8
9
10
0.600 0.606 0.629 0.629 0.684 0.591 0.614 0.541 0.541 0.530 0.555 0.565 0.568
0.556 0.620 0.689 0.707 0.717 0.698 0.689 0.585 0.641 0.641 0.605 0.584 0.583
0.566 0.616 0.636 0.694 0.713 0.746 0.711 0.686 0.708 0.670 0.635 0.594 0.625
0.566 0.631 0.677 0.723 0.734 0.731 0.697 0.670 0.683 0.689 0.660 0.650 0.658
0.615 0.627 0.678 0.728 0.752 0.748 0.711 0.683 0.730 0.690 0.678 0.622 0.665
0.574 0.639 0.695 0.730 0.744 0.755 0.713 0.702 0.703 0.700 0.667 0.650 0.642
0.576 0.636 0.668 0.744 0.749 0.741 0.723 0.714 0.714 0.692 0.703 0.649 0.674
0.589 0.653 0.691 0.767 0.747 0.752 0.717 0.721 0.721 0.693 0.682 0.653 0.676
Table 5 Preprocessing complexity. Data size
Cost
CPU cycles
Time ðlsÞ
Preprocessing (8 bit) Preprocessing (16 bit) Preprocessing (32 bit)
16 55 123
64 704 3120
8 88 390
Table 6 Execution time of a single step of the forward algorithm with N = 10. This time refers to the evaluation of the atþ1 and the normalization procedure for a single model. Algorithm Single step (8 bit) Single step (16 bit) Single step (32 bit)
CPU cycles 2088 37,288 240,240
T (ms) ð8 MHzÞ
T (ms) ð20 MHzÞ
0.261 4.661 30.03
0.104 1.864 12.01
Variables size (bits)
Memory cost (bytes)
8 16 32
N ¼ 10
N¼7
560 1120 2240
308 616 1232
On the SMCube we can store the models either on the flash memory (as constants), on the EEPROM, or load them at startup and store them on the RAM. The ATMega168 has enough flash memory to store all version of the HMM with N ¼ 10. If we want to dynamically change the models the whole code stored in flash must be updated using the bootloader. The duration of this operation is approximately 2 min, therefore it may not be suitable for certain applications. To speed up the update process, the models can be stored on the EEPROM or loaded through the wireless channel at startup on RAM. If we use the HMMs with 10 states the 8 bit models can be stored on RAM. However, if we look at Table 4 we see that HMMs with 7–9 states present recognition ratio comparable to the 10 states one. Therefore, we can reduce the number of states N while still achieving sufficient recognition rate. For example, the use of HMMs with N ¼ 7 (see Table 7) allows to load on EEPROM the 8 bit models and on RAM both the 8 and 16 bits models. Note that reducing the number of states reduces also the computational cost (that decreases with the square of N, see Table 2), thus we have more relaxed constraints on CPU clock or on sampling rate. Note that both the computational and memory cost increase linearly with the number of gestures to be recognized and with the square of the number of states (see Section 5.1). Therefore when considering applications that use more than four gestures we can reduce the number of states of the HMMs to fulfill the memory and computational constraint placed by the selected hardware.
7.4. Single user activity recognition To assess the influence of using fixed point notation, we evaluated the PR, RC, and CCR using a set of gestures form the same performer whose gesture were used to train the HMMs. Table 8 presents, as example, the indexes for user 1. As can be seen from these tables the 16 and 32 bits implementations show performance comparable to the one of the floating point implementation while using only 8 bit for data representation results in more than 20% drop of CCR. Fig. 3 compares the CCR of the fixed point implementations with of the floating point
Table 8 Classification performance for user 1. Class
PR 8b
PR 16b
PR 32b
PR fl
(a) Precision Gesture circle Gesture square Gesture X Gesture flip
0.612 0.603 0.446 0.762
0.759 0.789 0.804 0.875
0.759 0.789 0.804 0.875
0.757 0.836 0.884 0.917
(b) Recall and correct classification ratio Gesture circle 0.500 0.683 Gesture square 0.633 0.933 Gesture X 0.483 0.683 Gesture flip 0.800 0.933
0.683 0.933 0.683 0.933
0.883 0.933 0.633 0.917
CCR
0.808%
0.842%
0.604%
0.808%
82
P. Zappi et al. / Entertainment Computing 1 (2009) 75–84 Table 10 Multiuser scenario classification performance. Training set
Validation set Usr. 1
Fig. 3. Comparison of the correct classification ratio of the fixed point implementation and the floating point one (dashed line) when different variable sizes are used.
one. From this picture, is clear that the 16 and 32 bit solutions show accuracies comparable with the floating point one. According to Table 8 we can use 16 bit for our data representation with minimum recognition accuracy reduction while decreasing by 50% the memory cost and by 84% the computational cost (see Tables 6 and 7). For some users we noticed that the performance of the 8 bit classifier was higher than the other implementations. This is the case of user 2, as shown in Table 9. This behavior is tied to the fact that HMMs are a representative model of the gestures based on the training set. Furthermore, the Baum Welch training algorithm does not guarantee that we find the global maximum of the likelihood, but only a local one. Therefore, the error introduced by the not perfect data representation may affect the likelihood evaluation in a way that increases the recognition performance on the validation set.
Usr. 2
Usr. 3
Usr. 4
(a) Floating point implementation Usr. 1 0.842 Usr. 2 0.408 Usr. 3 0.438 Usr. 4 0.333
0.496 0.858 0.308 0.317
0.475 0.375 0.704 0.254
0.388 0.221 0.358 0.663
(b) 8-bit fixed point implementation Usr. 1 0.604 Usr. 2 0.446 Usr. 3 0.379 Usr. 4 0.333
0.446 0.825 0.292 0.321
0.375 0.354 0.596 0.263
0.321 0.208 0.375 0.642
(c) 16-bit fixed point implementation Usr. 1 0.808 Usr. 2 0.396 Usr. 3 0.425 Usr. 4 0.329
0.504 0.804 0.338 0.235
0.438 0.358 0.683 0.254
0.329 0.192 0.279 0.604
(d) 32-bit fixed point implementation Usr. 1 0.808 Usr. 2 0.391 Usr. 3 0.425 Usr. 4 0.329
0.504 0.800 0.338 0.325
0.438 0.358 0.683 0.254
0.379 0.208 0.346 0.604
sifier used in this study. Therefore, if a variant of the HMM standard model shows better performance, this gain will be preserved in the fixed point implementation. We believe that this is related to the choice of features. Since the classifier uses the magnitude of the acceleration, it does not have information regarding the direction. This makes different gestures from different users similar depending on how the user performs the movement (more or less sharp movements). Another choice of features may improve the system performance. This can be seen from Table 11a–d where it is reported the Confusion matrix relative to the case where performer 2 uses the cube trained by performer 1. As can be seen from this tables, the classifier tends to confuse gesture X and flip as circle, while gesture square is better recognized. Still, the confusion matrices for 16 and 32 bit fixed point implementations show better results than the floating point one.
7.5. Multiple user activity recognition In addition to previous tests, the performance of the algorithm in a multiuser scenario where a single cube is shared among different users has been evaluated. Table 10a–d present the CCR when the models trained on a user are validated on the other users for different implementations. These tables show how the selected gesture recognition algorithm presents poor results in a multiuser scenario. However this behavior is not related to data representation but only on the clas-
Table 9 Classification performance for user 2. Class
PR 8b
PR 16b
PR 32b
PR fl
(a) Precision Gesture circle Gesture square Gesture X Gesture flip
0.714 0.952 0.956 0.725
0.643 1.000 0.953 0.710
0.634 1.000 0.952 0.710
0.763 1.000 0.947 0.742
(b) Recall and correct classification ratio Gesture circle 0.750 0.750 Gesture square 1.000 0.967 Gesture X 0.717 0.683 Gesture flip 0.833 0.817
0.750 0.967 0.667 0.817
0.750 0.967 0.900 0.817
CCR
0.800%
0.858%
0.825%
0.804%
Table 11 Confusion matrices in multiuser scenario – performance when tester 2 uses cube trained by tester 1. Gesture
Classified as Circle
Square
X
Flip
(a) Floating point (CCR = 0.496) Circle 49 Square 3 X 57 Flip 38
2 50 1 0
8 7 1 3
1 0 1 19
(b) Fixed point 8 bit Circle Square X Flip
9 37 1 0
23 23 19 5
2 0 2 25
(c) Fixed point 16 bit (CCR = 0.504) Circle 44 Square 3 X 46 Flip 32
6 48 6 0
8 9 7 6
2 0 1 22
(d) Fixed point 32 bit (CCR = 0.504) Circle 45 Square 3 X 46 Flip 32
6 47 6 0
8 10 7 6
1 0 1 22
(CCR = 0.446) 26 0 38 30
P. Zappi et al. / Entertainment Computing 1 (2009) 75–84
As for the single user scenario, this behavior is a consequence of the fact that the Baum–Welch training algorithm does not produce the best possible models and the error introduced by the fixed point data representation may increase the CCR. 8. Conclusions The popularity of TUIs, physical objects used for human–computer interaction, is growing together with the development of VR applications and smart spaces. Effectiveness of TUIs can be enhanced by the use of smart objects able to sense their status and the gestures that the user performs with them. The on-board recognition of gestures therefore will play a central role in developing new TUIs in order to improve object batteries lifetime, system scalability and handling of moving TUIs. In this paper, we presented and characterized an implementation of the HMM forward algorithm suitable for the class of lowpower, low-cost MCU typically embedded into TUIs. HMM are state of the art algorithms for gesture and speech recognition. The proposed solution can be implemented on the SMCube, a tangible interface developed as a building block of the TANGerINE framework. The characterization of our algorithm in both single and multiuser scenarios demonstrates that the use of fixed point data representation results in recognition ratio comparable to the floating point case when using more than 16 bits. We evaluated the computational and memory cost of implementing a solution able to recognize four gestures with a set of 10 states HMMs. We show that the flash memory available on the SMCube is enough to store all version of the model, and that if fast recognition capabilities are needed we can use HMMs with a lower number of states without excessive recognition loss. By increasing the CPU clock to 20 MHz we show that the 16 bit version of the algorithm can be executed in real time at 100 Hz sampling rate. By decreasing the sampling rate we are also able to implement the 32 bit version and thus achieve better accuracy. Even if the classification algorithm in some cases does not achieve extremely high accuracy, this behavior is not related to the fixed point implementation but to the chosen algorithm itself. Thus, if a different version of HMM shows a better classification ratio it can be implemented using at least 16-bit fixed point operands on the SMCube without minimal performance loss. Acknowledgments Part of this work has been supported by TANGerINE project (www.tangerineproject.org), by ARTISTDESIGN project funded under EU FP 7 (Project Reference: 214373) (www.artist-embedded.org/artist/) and by SOFIA project funded under the European Artemis programme SP3 Smart environments and scalable digital service (Grant agreement: 100017) (www.sofia-project.eu). References [1] Atmel, Atmel products, 2009. Available from:
. [2] V. Baier, L. Mosenlechner, M. Kranz, Gesture classification with hierarchically structured recurrent self-organizing maps, in: Networked Sensing Systems, INSS’07, Fourth International Conference, 2007, pp. 81–84. [3] G. Bailador, D. Roggen, G. Tröster, G. Triviño, Real time gesture recognition using continuous time recurrent neural networks, in: BodyNets’07: Proceedings of the ICST Second International Conference on Body Area networks, ICST, Brussels, Belgium, Belgium, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2007, pp. 1–8. [4] S. Baraldi, A. Del Bimbo, L. Landucci, A. Valli, wikitable: finger driven interaction for collaborative knowledge-building workspaces, in: CVPRW’06: Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop, IEEE Computer Society, Washington, DC, USA, 2006, p. 144.
83
[5] S. Baraldi, A. Del Bimbo, L. Landucci, N. Torpei, O. Cafini, E. Farella, A. Pieracci, L. Benini, Introducing tangerine: a tangible interactive natural environment, in: Proceedings of ACM International Conference on Multimedia (MM), ACM Press, Augsburg, Germany, 2007, pp. 831–834. [6] Bluegiga, Bluegiga bluetooth modules, 2009. Available from:
. [7] M. Bruns Alonso, V. Keyson, Musiccube: a physical experience with digital music, in: Personal Ubiquitous Computing, vol. 10, Springer-Verlag, London, UK, 2006, pp. 163–165. [8] O. Cafini, Z. Piero, E. Farella, L. Benini, S. Baraldi, N. Torpei, L. Landucci, A. Del Bimbo, Tangerine smcube: a smart device for human computer interaction, in: Proceedings of IEEE European Conference on Smart Sensing and Context, IEEE Computer Society, 2008. [9] K. Camarata, E.Y.-L. Do, M.D. Gross, B.R. Johnson, Navigational blocks: tangible navigation of digital information, in: CHI’02: CHI’02 Extended Abstracts on Human Factors in Computing Systems, ACM, 2002, pp. 752–753. [10] C. Chen, J. Liang, H. Zhao, H. Hu, J. Tian, Factorial hmm and parallel hmm for gait recognition, in: Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions, vol. 39, 2009, pp. 114–123. [11] E. Farella, A. Pieracci, A. Acquaviva, Design and implementation of wimoca node for a body area wireless sensor network, in: Systems Communications, Proceedings, 2005, pp. 342–347. [12] F.G. Hofmann, P. Heyer, G. Hommel, Velocity profile based recognition of dynamic gestures with discrete hidden markov models, in: Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human– Computer Interaction, Springer-Verlag, London, UK, 1998, pp. 81–95. [13] C.-R. Huang, C.-S. Chen, P.-C. Chung, Tangible photorealistic virtual museum, in: IEEE Computer Graphics and Applications, vol. 25, IEEE Computer Society Press, Los Alamitos, CA, USA, 2005, pp. 15–17. [14] H. Ishii, The tangible user interface and its evolution, in: Communication of the ACM, vol. 51, ACM, New York, NY, USA, 2008, pp. 32–36. [15] H. Ishii, B. Ullmer, Tangible bits: towards seamless interfaces between people, bits and atoms, in: CHI’97: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, 1997, pp. 234–241. [16] K. Jeong, J. Won, C. Bae, User activity recognition and logging in distributed intelligent gadgets, in: Multisensor Fusion and Integration for Intelligent Systems, MFI 2008, IEEE International Conference, 2008, pp. 683–686. [17] S. Jordà, M. Kaltenbrunner, G. Geiger, R. Bencina, The reactable, in: Proceedings of the International Computer Music Conference (ICMC 2005), Barcelona, Spain, 2005. [18] M. Kaltenbrunner, R. Bencina, Reactivision: a computer-vision framework for table-based tangible interaction, in: TEI’07: Proceedings of the First International Conference on Tangible and Embedded Interaction, ACM, 2007, pp. 69–74. [19] E. Keogh, S. Chu, D. Hart, M. Pazzani, An online algorithm for segmenting time series, in: Proceedings of the IEEE International Conference on Data Mining, 2001, pp. 289–296. [20] L. Kim, H. Cho, S.H. Park, M. Han, A tangible user interface with multimodal feedback, in: Twelfth International Conference on Human–Computer Interaction HCI, 2007, pp. 94–103. [21] M. Ko, G. West, S. Venkatesh, M. Kumar, Online context recognition in multisensor systems using dynamic time warping, in: Proceedings of Conference on Intelligent Sensors, Sensor Networks and Information Processing Conference, 2005, pp. 283–288. [22] S. Mann, ‘‘Smart clothing”: wearable multimedia computing and ‘‘personal imaging” to restore the technological balance between people and their environments, in: MULTIMEDIA’96: Proceedings of the Fourth ACM International Conference on Multimedia, ACM, New York, NY, USA, 1996, pp. 163–174. [23] K. Matthias, S. Dominik, H. Paul, S. Albrecht, A display cube as tangible user interface, in: In Adjunct Proceedings of the Seventh International Conference on Ubiquitous Computing (Demo 22), 2005. [24] A. Mazalek, M. Reynolds, G. Davenport, Tviews: an extensible architecture for multiuser digital media tables, in: Computer Graphics and Applications, IEEE, vol. 26, 2006, pp. 47–55. [25] M.A. Mendoza, N. Pérez De La Blanca, Applying space state models in human action recognition: a comparative study, in: AMDO’08: Proceedings of the Fifth International Conference on Articulated Motion and Deformable Objects, Springer-Verlag, Berlin, Heidelberg, 2008, pp. 53–62. [26] Microsoft Corporation, Microsoft surface, 2009. Available from:
. [27] S. Moon, J.-N. Hwang, Robust speech recognition based on joint model and feature space optimization of hidden markov models, in: IEEE Transactions on Neural Networks, vol. 8, March 1997, pp. 194–204. [28] Nintendo, Wii homepage, 2008. Available from:
. [29] S. Oh, W. Woo, Manipulating multimedia contents with tangible media control system, in: Third International Conference on Entertainment Computing ICEC, 2004, pp. 57–67. [30] C. O’Malley, D. Stanton Fraser, Literature Review in Learning with Tangible Technologies, Technical Report, Learning Sciences Research Institute, University of Nottingham, Department of Psychology, University of Bath, 2004. [31] K.S. Park, H.S. Cho, J. Lim, Y. Cho, S.-M. Kang, S. Park, Learning cooperation in a tangible moyangsung, in: R. Shumaker (Ed.), HCI (14), vol. 4563, Lecture Notes in Computer Science, Springer, 2007, pp. 689–698.
84
P. Zappi et al. / Entertainment Computing 1 (2009) 75–84
[32] L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, in: Proceedings of the IEEE, vol. 77, 1989, pp. 257–285. [33] M. Rauterberg, T. Mauch, R. Stebler, The digital playing desk: a case study for augmented reality, in: Robot and Human Communication, Fifth IEEE International Workshop, 1996, pp. 410–415. [34] D. Roggen, N. Bharatula, M. Stäger, P. Lukowicz, G. Tröster, From sensors to miniature networked sensor buttons, in: Proceedings of the Third International Conference on Networked Sensing Systems, INSS 2006, 2006, pp. 119–122. [35] A. Smailagic, D.P. Siewiorek, U. Maurer, A. Rowe, K.P. Tang, ewatch: context sensitive system design case study, in: ISVLSI’05: Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design, IEEE Computer Society, Washington, DC, USA, 2005, pp. 98–103. [36] Sony, Eyetoy homepage, 2003. Available from:
. [37] T. Stiefmeier, G. Ogris, H. Junker, P. Lukowicz, G. Tröster, Combining motion sensors and ultrasonic hands tracking for continuous activity recognition in a maintenance scenario, in: Tenth IEEE International Symposium on Wearable Computers, 2006. [38] T. Stiefmeier, D. Roggen, G. Ogris, P. Lukowicz, G. Tröster, Wearable activity tracking in car manufacturing, in: IEEE Pervasive Computing, vol. 7, 2008, pp. 42–50. [39] STM, Stm tri-axial accelerometer, 2009. Available from:
. [40] M. Tahir, G. Bailly, E. Lecolinet, Aremote: a tangible interface for selecting TV channels, in: Artificial Reality and Teleexistence, in: Seventh International Conference, 2007, pp. 298–299.
[41] K. Ueda, A. Kosaka, R. Watanabe, Y. Takeuchi, T. Onoye, Y. Itoh, Y. Kitamura, F. Kishino, m-activecube; multimedia extension of spatial tangible user interface, in: Proceedings of the Second International Workshop on Biologically Inspired Approaches to Advanced Information (BioADIT), 2006, pp. 363–370. [42] J. Verhaegh, W. Fontijn, A. Jacobs, On the benefits of tangible interfaces for educational games, in: Digital Games and Intelligent Toys Based Education, Second IEEE International Conference, 2008, pp. 141–145. [43] R. Watanabe, Y. Itoh, Y. Kitamura, F. Kishino, H. Kikuchi, Distributed autonomous interface using activecube for interactive multimedia contents, in: ICAT’05: Proceedings of the 2005 International Conference on Augmented Tele-existence, ACM, New York, NY, USA, pp. 22–29. [44] M. Weiser, The Computer for the 21st Century, vol. 265, Scientific American, 1991, pp. 66–75. [45] P. Zappi, T. Stiefmeier, E. Farella, D. Roggen, L. Benini, G. Troster, Activity recognition from on-body sensors by classifier fusion: sensor scalability and robustness, in: Intelligent Sensors, Sensor Networks and Information, ISSNIP 2007, Third International Conference, 2007, pp. 281–286. [46] L. Zeller, L. Scherffig, Cubebrowser: a cognitive adapter to explore media databases, in: CHI EA’09: Proceedings of the 27th International Conference Extended Abstracts on Human factors in Computing Systems, ACM, 2009, pp. 2619–2622. [47] S. Zhong, J. Ghosh, Hmms and coupled hmms for multi-channel eeg classification, in: Neural Networks, IJCNN’02, Proceedings of the 2002 International Joint Conference, vol. 2, 2002, pp. 1154–1159.