Activity recognition with android phone using mixture-of-experts co-trained with labeled and unlabeled data

Activity recognition with android phone using mixture-of-experts co-trained with labeled and unlabeled data

Neurocomputing 126 (2014) 106–115 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Activit...

2MB Sizes 3 Downloads 99 Views

Neurocomputing 126 (2014) 106–115

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Activity recognition with android phone using mixture-of-experts co-trained with labeled and unlabeled data Young-Seol Lee, Sung-Bae Cho n Department of Computer Science, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, South Korea

art ic l e i nf o

a b s t r a c t

Article history: Received 27 February 2012 Received in revised form 20 March 2013 Accepted 30 May 2013 Available online 30 July 2013

As the number of smartphone users has grown recently, many context-aware services have been studied and launched. Activity recognition becomes one of the important issues for user adaptive services on the mobile phones. Even though many researchers have attempted to recognize a user's activities on a mobile device, it is still difficult to infer human activities from uncertain, incomplete and insufficient mobile sensor data. We present a method to recognize a person's activities from sensors in a mobile phone using mixture-of-experts (ME) model. In order to train the ME model, we have applied global– local co-training (GLCT) algorithm with both labeled and unlabeled data to improve the performance. The GLCT is a variation of co-training that uses a global model and a local model together. To evaluate the usefulness of the proposed method, we have conducted experiments using real datasets collected from Google Android smartphones. This paper is a revised and extended version of a paper that was presented at HAIS 2011. & 2013 Elsevier B.V. All rights reserved.

Keywords: Mixture-of-experts Co-training Activity recognition Android phone

1. Introduction Smart phones, such as Google Android phone and Apple iPhone, have recently incorporated diverse and powerful sensors. The sensors include GPS receivers, microphones, cameras, light sensors, temperature sensors, digital compasses, magnetometers and accelerometers. Because of the small size and the superior computing power of the smartphones, they can become powerful sensing devices that can monitor a user's context in real time. Especially, Android-based smart phones are suitable platforms to collect sensing information because the Android operating system is free, open-source, easy to program, and expected to become a dominant smartphone in the marketplace. Many context-aware services were introduced and they tried to provide a user with convenience on a mobile phone. For instance, there are social networking applications such as Facebook [1] and MySpace [2]. Foursquare provides a location-based social networking service. It ranks the users by the frequency of visiting a specific location and encourages them to check in the place. Loopt service can recommend some locations for the users, and Whoshere service helps users to share friends' locations. Davis et al. tried to use temporal, spatial, and social contexts to manage multimedia

n

Corresponding author. Tel.: +82 2 2123 2720; fax: +82 2 365 2579. E-mail addresses: [email protected] (Y.-S. Lee), [email protected] (S.-B. Cho). 0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.05.044

content with a camera phone [3]. Until now, most of the commercial services use only raw data like GPS coordinates. In the mobile environment, inferring context information becomes an important issue for mobile cooperative services. By capturing more meaningful context in real time, we can develop more adaptive applications to the changing environment and user preferences. Activity recognition technology is one of the core technologies to provide user adaptive services. Many researchers have attempted to infer high-level semantic information from raw data collected in a mobile device. Bellotti et al. developed a Maggiti system which predicted a user's activities (eating, shopping, etc.) and provided suitable services on Windows Mobile platform [4]. Kwapisz et al. studied a method to recognize a user's behavior using cell phone accelerometers [5]. Chen proposed intelligent location-based mobile news service as a kind of locationbased service [6]. Other researchers have studied context-aware systems considering efficient energy management. Paek et al. controlled GPS receiver to reduce energy consumption on a smartphone [7]. Ravi et al. predicted battery charge cycle and recommended the charge [8]. Other researchers tried to stop unnecessary functionalities in the context. It is difficult to recognize a user's situation because of uncertainty and incompleteness of context in mobile environment. Most of the research used various statistical analysis and machine learning techniques such as probabilistic model, fuzzy logic, and case based reasoning. However, it is still difficult to apply them to mobile devices because of two problems. First, although the machine learning techniques are appropriate to deal

Y.-S. Lee, S.-B. Cho / Neurocomputing 126 (2014) 106–115

107

2. Related works

provide useful information for activity recognition. Accelerometers were initially included in these devices to support advanced game play and to enable automatic screen rotation but they clearly have many other applications. In fact, there are many useful applications that can be built if accelerometers can be used to recognize a user's activity. Therefore, using the accelerometer-embedded mobile phones to recognize people′s physical activities becomes the primary choice among all the solutions. Győrbíró et al. presented a system to recognize a person's activities using MotionBand device. They used a feed-forward neural network to classify activities from acceleration data [14]. Berchtold et al. proposed an architecture to recognize a user's activities for services. The system utilizes classifiers based on fuzzy inference systems which use real time annotation to personalization [15]. Zhao et al. developed a smart-phone based activity reporting system using a three axial accelerometer. The system used decision tree and k-means clustering algorithm to recognize a user's activity [16]. In order to monitor elderly people's health, Zhao et al. presented an activity recognition method based on ELM (a variation of neural network) using accelerometer in a mobile phone [17]. Kwapisz et al. developed an Android phone based activity recognition method using an accelerometer [5]. Table 1 summarizes the studies to recognize human activities using accelerometers. Several researchers have investigated activity recognition using various sensors as well as accelerometers as shown in Table 2. However, most of the research focuses on accurate activity recognition without considering modularization and unlabeled data for the improved performance.

2.1. Activity recognition using mobile sensors

2.2. Mixture-of-experts using unlabeled data

There are many attempts to recognize a user's actions and behaviors with mobile sensors. Among all these mobile sensors, the accelerometer is commonly used for activity recognition. All of these Android phones contain tri-axial accelerometers that measure acceleration in all three spatial dimensions. The accelerometers are also capable of detecting the orientation of the device (helped by the fact that they can detect the direction of Earth's gravity), which can

ME models [12] consist of a set of local experts and a gating network which combines the decisions from experts. The algorithm breaks down a complex problem into a set of simple subproblems in a recursive manner. The solutions to the sub-problems are combined to make a solution to the complex problem. In the ME model, it is assumed that there are separate sub-problems in a more complex underlying problem. According to the divide-and-

with vagueness and uncertainty in mobile environment, it is not easy to classify user's activities using only one classifier from incomplete mobile data in a complex environment. The other problem is that many supervised machine learning techniques need hand-labeled data samples for good classification performance. It requires users to annotate the sensor data. In many cases, unlabeled data are significantly easier to come by than labeled ones [9,10]. These labeled data are fairly expensive to obtain because they require human effort [11]. Therefore, we would like our learning algorithm to be able to take as much advantage of the unlabeled data as possible. In this paper, we present an activity recognition system using mixture of experts (ME) [12] on Android phone. ME is one of the divide-and-conquer models which are effective solutions to overcome the limitations of mobile environment. The ME is based on several local experts to solve smaller problems and a gating network to combine the solutions from separate experts, each of which is a statistical model for a part of overall problem space. In order to train the ME with labeled and unlabeled data, the global– local co-training (GLCT) method is used. GLCT is one of the variations of co-training [13], and enables us to improve the performance of ME model by minimizing errors which occur due to the unlabeled data. To evaluate the usefulness of the proposed system, we conducted experiments using real datasets collected from Google Android smart phones.

Table 1 Activity recognition with accelerometers in a mobile device. Author

Year

Classifier

Contribution

Frank et al. [18] Sun et al. [19] Brezmes et al. [20] Bieber et al. [21] Yang et al. [22] Ravi et al. [23]

2011 2010 2009 2009 2009 2005

Bao et al. [24]

2004

Geometric template matching (GTM) Support vector machine (SVM) K-nearest neighbor Heuristics (rule) Decision tree, k-means clustering, HMM Decision tables, C4.5 Decision trees, k-nearest neighbor, SVM, naïve Bayes Decision tables, instance based learning, C4.5 decision trees, naïve Bayes

Demonstration for activity and gait recognition system Activity recognition with an accelerometer considering people's pocket position Activity recognition with no server processing data for a reasonable low cost Identification of physical activity in everyday life on a standard mobile phone Using decision tree to classify physical activities and HMM to consider activity sequences Combining classifiers using Plurality Voting for activity recognition from a single accelerometer Recognizing a variety of household activities using acceleration for context-aware computing

Table 2 Activity recognition with various sensors in a mobile device. Author

Year

Sensors

Classifier

Contribution

Hong et al. [25]

2010

Accelerometer, RFID tags

Decision tree

Wang et al. [26]

2009

Threshold based heuristics

Choudhury et al. [27]

2008

Consolvo et al. [28]

2008

Accelerometer, microphone, GPS, WiFi, Bluetooth Accelerometer, barometer, compass, humidity, temperature Accelerometer, barometer, microphone, humidity, etc.

Hierarchical approach for robust and accurate recognition Sensor management for mobile devices to monitor user state Using Mobile Sensing Platform (MSP) for activity recognition System for on-body sensing, and activity inference, and three-week field trial

Probabilistic model Boosted decision stump classifiers

108

Y.-S. Lee, S.-B. Cho / Neurocomputing 126 (2014) 106–115

conquer principle, the ME model should work well for problems composed of smaller unconnected ideas. The experts can be modeled to give a solution to these smaller sub-problems, while the combination of the solutions from the expert will be given by the gating network. Each expert deals with different features from a different perspective, thereby resolving the small separable problems. Through the features of the model, it has been applied to traditional complicated recognition problems [29,30]. One of the considerations when implementing ME models is the way to separate the input data space and generate experts. In order to divide the input data space into several subgroups, clustering techniques can be applied as shown in [31]. Some clustering techniques such as k-means clustering and hierarchical clustering have been widely used. In supervised learning, local experts in the ME model can be trained easily by using conventional machine learning techniques, for example, MLP [32], support vector machine (SVM) [33] or probabilistic model [34]. However, in semi-supervised learning, it requires an extra learning method to deal with unlabeled data. The simplest way to train the ME model with both labeled and unlabeled data is to apply previous semi-supervised learning methods such as individual-training or co-training directly to the model. However, it is already known that local experts are sensitive about the number of the training instances and can be biased or the variance can be increased when there are only a few amounts of instances [35]. Especially in semi-supervised learning which uses only a small amount of labeled data, the initial ME model may not be accurate enough to evaluate unlabeled data. In order to overcome this limitation, the semi-supervised learning method specialized to the ME model is required. The EM algorithm to train the ME model in semi-supervised approach was proposed for this reason. Nevertheless, it has a limitation caused by the model assumption of EM algorithm mentioned in the previous section, and another enhanced method is still required. In this paper, we use the GLCT method to train the ME model. The method is specialized to train the ME model since it overcomes the imperfect model problem related to the lack of training instances by using the global model together.

3. Activity recognition using ME The proposed activity recognition system consists of three steps: to collect sensor data, preprocess the data and recognize a user's activity. First, after sensor data are collected from sensors on a smartphone, the data are transferred to preprocessing units for extracting features such as mean and standard deviation. The following Eqs. (1)–(3) denote the features such as the difference between previous acceleration and current acceleration, average acceleration, and standard deviation of acceleration that are extracted

from the datasets. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N sum_X ¼ ∑ ðxi xi1 Þ2

ð1Þ

i¼1

mean_X ¼

∑N i¼1

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðxiþ1 xi Þ2 N

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 std_X ¼ ∑ ðxiþ1 xi Þ2 mean_X

ð2Þ

ð3Þ

After the preprocessing step, ME model is used to recognize a user's activities from the feature set. Fig. 1 shows the process of the entire system.

3.1. Mixture-of-experts construction Prior to applying the ME model to activity recognition, local experts must be built with separate dataset. Each expert deals with only a part of dataset. When dividing entire data into several subgroups, each subgroup should guarantee similarities between members in it. This is required to give specialties to local experts in distinguishing some similar patterns even though they are only a part of entire data as shown in Fig. 2. The clustering technique is used to define boundaries to estimate similarities between data. K-means clustering approach is a method of cluster analysis which aims to assign n data items into k sets in which each item belongs to the set with the nearest mean value. It is a simple method which is easy to implement and shows good performance on common dataset. The distance from the centroids to data items can be calculated with various measures such as Euclidean distance and Pearson correlation coefficient. K-means clustering consists of the following process of five steps. (1) (2) (3) (4) (5)

Determine k, the number of clusters. Calculate k centroids. Calculate distances of data items to centroids. Assign data items into k sets in terms of the nearest centroid. Iterate (2)–(4) until there is no change of k sets.

In order to implement the global and the local models including local experts and the gating network, MLP was used for all the models. For the global model, conventional MLP was applied directly. The ME model was designed to include the five local experts in total. Here, MLP is employed as a base classifier for each expert in mixture-of-local-experts. Fig. 3 depicts the structure of the ME model used. To train the ME model, learning technique for MLP-based ME model [32] is used. fL (X) in Eqs. (4) and (5) denotes the output of the mixture-of-experts model where πi is the ith output of gating network and fi(X) is the output of the ith local

Fig. 1. Activity recognition system overview.

Y.-S. Lee, S.-B. Cho / Neurocomputing 126 (2014) 106–115

109

Fig. 2. Training mixture-of-experts model.

Fig. 3. Mixture-of-experts model.

expert. X is an input vector with multivariate attributes. k

f L ðXÞ ¼ ∑ π i f i ðXÞ; i¼1

g

∑ π i ¼ 1;

i¼1

X ¼ fx1 ; ⋯; xn g

π i ≥0ði ¼ 1; ⋯; kÞ

ð4Þ

ð5Þ

Each neural network in the model used sigmoid function like Eq. (6) as activation function for a hidden layer and an output layer. aðxÞ ¼

1 1 þ ex

ð6Þ

3.2. GLCT algorithm GLCT algorithm is summarized in Fig. 4. Given a set L of labeled training instances, we can obtain the global model MG and the ME model ML which consists of N experts μ1, μ2, …, μN and the gating network g. The algorithm iteratively extracts the instances which are going to be labeled based on two criteria, preventing degradations and expecting improvements. Since the models can be imperfect and show poorer performance than the opponent in some conditions, the models should carefully predict labels, especially when there are not sufficient training data for local experts to generalize the entire problem. In order to prevent degradations due to model imperfections, the algorithm compares the confidences from MG and ML. The models can provide labeled instances for the opponents only if each model is more confident than the other one. The instances satisfying the first criterion are added to candidate sets U′L or U′G. After obtaining the candidate sets, the sets are sorted by considering confidence degrees in descending order. At the next

Fig. 4. Global–local co-training algorithm.

step, at most N instances for each class with higher confidence degree than the certain threshold e are chosen gradually from the top of the set to be labeled. By choosing instances with sufficient confidence, it is expected that the models would be improved at the next iteration. The chosen instances are added to the training dataset for each model, LG and LL. The models are trained with newly extended

110

Y.-S. Lee, S.-B. Cho / Neurocomputing 126 (2014) 106–115

labeled dataset. If LG and LL do not change, the training stops. Afterwards, the algorithm iterates the training procedure until no additional data samples for LL and LG remained. In the co-training algorithm, the local and the global models are based on feed-forward neural networks trained with backpropagation learning. The neural networks are repeatedly trained by gradient descent with the error function on given training set from the other model. Newly labeled data by a global model are used to train a local model in the next step and vice versa. In Eqs. (7) and (8), EL and EG denote error functions in the additional training set for a local model and a global model, respectively. EL ¼ jjOG ∑π i f Li ðxÞjj

ð7Þ

EG ¼ jjOL f G ðxÞjj

ð8Þ

In Eq. (7), an error function for a local model, OG, is the output vector of a global model, πi is the gating value of the ith local expert, and fLi(x) is the output vector of the ith local expert. OL is the output vector of a local model and fG (x) is the output vector of a global model. In detail, OG in Eq. (7) can be replaced with the output function of a global model at the previous step, f′G (x) and output function of a local model at the previous step,∑π i0 f Li0 ðxÞ can be substituted for OL in Eq. (8) as shown in Eqs. (9) and (10). EL ¼ jjf G0 ðxÞ∑π i f Li ðxÞjj

ð9Þ

EG ¼ jj∑π i0 f Li0 ðxÞf G ðxÞjj

ð10Þ

Ideally, the co-training algorithm is stopped when there is no difference between current error and previous error as illustrated in Eq. (11). co-training is stopped if

∂EL ∂EG ¼ 0 and ¼0 ∂x ∂x

ð11Þ

However, it is practically difficult to satisfy the condition because it is satisfied only when two different views have the same error value or no change of error values from additional training set. In this paper, we determine when the co-training is stopped by the

co-training is stopped when NðLL Þ ¼ 0 and NðLG Þ ¼ 0

ð12Þ

3.3. Measuring confidence degrees In order to obtain confidence degrees of examples, we design the measure for confidences. Let conf(w,x) be the original confidence degree for input instance x defined by the corresponding model w. conf(w, x) can be changed by the type of model used to build the global or the local model. In this work, we implemented both the global and the local models by using MLP. The output of MLP can be estimated as confidence degrees, which can be simply defined as follows: conf ðw; xÞ ¼ fwðxÞjwðxÞ is output of w for xg

ð13Þ

Since the original problem space is divided into several subspaces to generate local experts, the number of training data for each expert is rather smaller than the global model. This can make the ME model have large bias or variance, because there are chances that some of the training instances for certain class are missing in some sub-regions even though the original space contains all classes. Since some experts have not learned about certain classes, they can generate unexpected values as outputs for the missing classes according to the type of models used as local experts. To prevent this problem, the existence of training instances of certain class should be checked prior to computing confidence degrees. Eqs. (14) and (15) show the modified confidence degree for class C of the global model and the ME model, respectively. ( M G;C ðxÞ; if N MG;C 4 0 conf ðM G ; x; cÞ ¼ ð14Þ 0; otherwise N

conf ðM L ; x; cÞ ¼ ∑ g i conf ðμi ; x; cÞ i¼1

(

Table 3 A summary of the datasets used. Dataset

number of additional training sets N for a global model and a local model. The number of additional training sets N(LL) and N(LG) are decided by the confidence degree, introduced in the next section.

conf ðμi ; x; cÞ ¼ #Features #Classes #Samples

Acceleration only 6 Acceleration, orientation, magnetic field, etc. 22

3 5

1107 661

μi;C ðxÞ;

if N μi;c 4 0

0;

otherwise

ð15Þ

μi is the ith local expert, and g is the gating network. MG,C(x) and μi, c(x) represent the outputs of MG and μi for class C, respectively. gi is the gating output for the ith local expert and NMG,C and Nμi,C are

Fig. 5. (a) Class and datasets, and (b) Android phone based data collection interface.

Y.-S. Lee, S.-B. Cho / Neurocomputing 126 (2014) 106–115

the numbers of training instances for class C used to train MG and μi, respectively.

4. Experiments This section presents the experiments conducted to evaluate the usefulness of the proposed method on the pattern classification problems. The number of hidden nodes for each neural network is fixed to ten and the learning rate is 0.3. There are two objectives of the experiments. Firstly, it is to prove the usefulness of co-training Table 4 Accuracy comparison for each algorithm on dataset 1 (5-folds). Algorithm

No. of labeled data

Mixture of local 1107 experts Mixture of local 55 experts Co-training based ME 55 Mixture of local 110 experts Co-training based ME 110 Mixture of local 166 experts Co-training based ME 166

No. of unlabeled data

Accuracy (%)

91.70 ( 7 7.61)

0 1052

75.74 ( 7 14.56)

1052 997

86.34 ( 7 11.11) 80.87 ( 7 11.05)

997 941

90.17 ( 7 5.42) 83.04 ( 7 9.12)

941

90.81 ( 7 2.96)

Table 5 Accuracy comparison for each algorithm on dataset 1 (10-folds). Algorithm

No. of labeled data

No. of unlabeled data

Accuracy (%)

Mixture of local experts Mixture of local experts Co-training based ME Mixture of local experts Co-training based ME Mixture of local experts Co-training based ME

1107

0

92.56 ( 71.05)

55

1052

76.28 ( 77.39)

55

1052

87.27 ( 79.92)

110

997

81.07 ( 75.59)

110

997

90.25 ( 75.82)

166

941

83.48 ( 75.10)

166

941

91.78 ( 74.72)

111

based on unlabeled data. Secondly, our proposed method shows better performance than semi-supervised self-training as [36]. As shown in Fig. 2, we divide the dataset into some groups to learn local expert models in our mixture of experts. In the experiments, we use k-means clustering approach to learn an ME model with five experts. In the initial stage, the labeled data are used to learn the model. After the first learning, the unlabeled data are labeled using the previous learned models. 4.1. Android phone datasets For the experiments, the evaluation of the proposed method is conducted on two Android phone datasets as shown in Table 3. The two types of datasets consist of acceleration, orientation, and magnetic fields to classify a user's transportation mode which are collected from Android OS smart phones using an interface as shown in Fig. 5(b). The dataset can be downloaded and tested in the author's homepages (http://sclab.yonsei.ac.kr/movingstatus. csv and http://sclab.yonsei.ac.kr/movingstatus_2.csv). For all datasets, data instances were partially labeled depending on the ratio of the instances labeled. Labeled data instances were chosen randomly by ignoring class distributions. The class labels are still, walk, and run in dataset 1 and still, walk, vehicle, run, and subway in dataset 2 as shown in Fig. 5(a). 4.2. Experimental results with android phone datasets Firstly, we conduct experiments to show the improvement of performance by using unlabeled data. For each experiment, the performances of models trained in three different conditions that are trained with all labeled data, only with partially labeled data and with both labeled and unlabeled data were evaluated by 5-folds and 10-folds cross validation. All experiments were performed ten times to obtain the average accuracy. First of all, we conducted learning process for two different datasets to show the feasibility of the proposed method. Especially, in order to show that GLCT performs well regardless of the ratio of initial labeled data, we changed the ratio of labeled instances to observe performances according to the amount of initial labeled data. Tables 4 and 5 and Fig. 6 show the result of the test on dataset 1 and Tables 5–7 and Fig. 7 show the result of dataset 2. As the result, the model trained with both labeled and unlabeled data showed better performance than the model with only labeled data. There is no big difference between 5-folds and 10-folds cross validation. The standard NN and ME models were trained well for all datasets by the proposed GLCT method. This

Fig. 6. Performances according to the ratio of labeled instances for dataset 1.

112

Y.-S. Lee, S.-B. Cho / Neurocomputing 126 (2014) 106–115

result implied that the GLCT can be used as the training method for the ME model in semi-supervised approach. The performance of the model was improved in every case from 7% to 10% for dataset 1 and from 7% to 8% for dataset 2. With only 5% initial labeled data, the model showed the most significant performance improvement for dataset 1. However, the results for dataset 2 did not show the big difference in performance improvement.

Table 6 Accuracy comparison for each algorithm on dataset 2 (5-folds). Algorithm

No. of labeled data

No. of unlabeled data

Accuracy (%) for data set 2

Mixture of local experts Mixture of local experts Co-training based ME Mixture of local experts Co-training based ME Mixture of local experts Co-training based ME

661

0

71.56 (7 5.85)

33

628

60.94 (7 8.28)

33

628

68.09 (7 3.69)

66

595

54.87 (7 3.90)

66

595

61.85 (7 2.94)

99

562

60.75 (7 6.61)

99

562

68.10 (7 2.63)

Table 8 shows the result of paired t-tests to examine the significant difference between ME model trained with only partially labeled data and that trained with labeled and unlabeled data. The t-value is greater than 4.78 for all tests with the degree of freedom 9 and the probability is less than 0.001, indicating that the two models are significantly different in their accuracy with 99.9% confidence. On the other hand, Table 9 summarizes the result of F-tests to evaluate the statistical difference between ME model trained with only partially labeled data and that trained with labeled and unlabeled data. The values in Table 9 are greater than threshold for all tests, and it implies that the two models are significantly different with 99.9% confidence. Finally, we compared GLCT with the alternative way to train the ME model to show the superiority of GLCT. In this experiment, the Table 8 Paired t-test between ME_L and ME_UL.

Dataset 1 Dataset 2

5% Labeled

10% Labeled

15% Labeled

5.130445806 5.861555853

7.532020147 13.29366642

5.623460193 7.361158544

5% Labeled

10% Labeled

15% Labeled

21.45512662 30.75440238

42.58100057 30.05156706

37.26350818 32.78501936

Table 9 F-test between ME_L and ME_UL.

Table 7 Accuracy comparison for each algorithm on dataset 2 (10-folds). Algorithm

No. of labeled data

No. of unlabeled data

Accuracy (%) for dataset 2

Mixture of local experts Mixture of local experts Co-training based ME Mixture of local experts Co-training based ME Mixture of local experts Co-training based ME

661

0

73.30 ( 7 1.78)

33

628

61.03 ( 7 3.95)

33

628

68.10 ( 7 5.57)

66

595

55.12 ( 7 1.62)

66

595

63.61 ( 7 7.59)

99

562

61.03 ( 7 2.82)

99

562

68.98 ( 7 6.10)

Dataset 1 Dataset 2

Input: Labeled training data L Unlabeled training data U Classifier A Output: The predicted labels for unlabeled examples One trained classifier: A Repeat until U is empty: Build classifier A by L Use A to examples in U. For each class C, pick the unlabeled examples with higher confidence degree, and add them into L Fig. 8. Individual-training algorithm.

Fig. 7. The model performance comparison result for dataset 2.

Y.-S. Lee, S.-B. Cho / Neurocomputing 126 (2014) 106–115

individual-training algorithm [36,37] was used as the alternative technique to train the ME. As mentioned in Section 2, it is one of the conventional semi-supervised learning techniques to train the ME. Fig. 8 shows the algorithm of the individual-training. For the experiment, the dataset 1 and 2 with 5%, 10%, 15% labeled instances were used as shown in Table 10. GLCT produces significantly improved performance than the individual-training algorithm as shown in Figs. 9 and 10. Figs. 11 and 12 show the change of accuracies by iterating co-training and individual-training methods, respectively. Initial accuracies of the two methods were

similar, but as iterations increased, the GLCT yielded greater improvement in performance than the individual-training. In many cases, the performance of individual-training tends to get worse as the iteration goes by.

Table 10 Performances comparison between GLCT and individual-training.

Dataset1 Dataset2 Dataset1 Dataset2 Dataset1 Dataset2

Label ratio (%)

GLCT

5 5 10 10 15 15

86.34 68.09 90.17 61.85 90.81 68.10

Individual-training ( 7 11.11) ( 7 3.69) ( 7 5.42) ( 7 2.94) ( 7 2.96) ( 7 2.63)

52.16 45.33 61.12 49.81 66.08 57.04

( 75.91) ( 76.27) ( 73.75) ( 73.90) ( 75.38) ( 73.85)

113

Fig. 11. Change of accuracies by iterating two methods for dataset 1.

Fig. 9. Comparison between co-training and individual-training for dataset 1.

Fig. 10. Comparison between co-training and individual-training for dataset 2.

114

Y.-S. Lee, S.-B. Cho / Neurocomputing 126 (2014) 106–115

Fig. 12. Change of accuracies by iterating two methods for dataset 2.

5. Concluding remarks In this paper, we have proposed an activity recognition system using the mixture-of-experts (ME) model with both labeled and unlabeled data. ME is a variant of divide-and-conquer paradigm, and it is suitable to solve complex problem to recognize a user's activity from uncertain, and incomplete mobile sensor data. The GLCT method is also used to train the ME model. In order to overcome the limitation of the local experts with small amount of training data, we employed the GLCT method to improve the ME model. GLCT is a method to build a model in which unlabeled data can be used to augment labeled data, based on having two views (global model and mixture of experts). The two views are redundant but not completely correlated. It is assumed that unlabeled instances can potentially be used to increase the accuracy of the model. The contribution lies in applying ME model and the GLCT method to activity recognition on a mobile phone. To show the usefulness of the proposed system, we conducted experiments using datasets collected from Google Android smart phones. The result of the experiments showed that GLCT could train the ME model well by predicting unlabeled instances accurately, and outperformed the individual training algorithm for the ME model. We can use co-training applied to the large volume of unlabeled acceleration data, to improve the accuracy of activity recognition. It can also be applied to activity recognition using multiple sensors such as orientation, magnetic field and so on. The important point is that a large volume of unlabeled instances can easily be collected in real world, but the labeled instances are very expensive. GLCT can be a method to use the unlabeled data for context recognition. Even though the performance of GLCT was promising in experiments, the method should be justified with theoretical analysis. In the future, we are planning to prove that the ME model is learnable by GLCT theoretically. We also have to apply this method to other domains which have data with missing features, and analyze the effectiveness to maximize performance.

Acknowledgment This research was supported by the Industrial Strategic Technology Development Program (10044828) funded by the Ministry of Trade, Industry and Energy (MI, Korea), and the ICT R&D Program 2013 funded by the MSIP (Ministry of Science, ICT & Future Planning, Korea). References [1] Facebook 〈http://www.facebook.com〉. [2] MySpace 〈http://www.myspace.com〉.

[3] M. Davis, J. Canny, V.N. House, N. Good, S. King, R. Nair, C. Burgener, R. Strickland, G. Campbell, S. Fisher, N. Reid, MMM2: Mobile Media Metadata for Media Sharing, ACM MM 2005, 2005, pp. 267–268. [4] V. Bellotti, B. Begole, H.E. Chi, N. Ducheneaut, J. Fang, E. Isaacs, T. King, W.M. Newman, K. Partridge, B. Price, P. Rasmussen, M. Roberts, J.D. Schiano, A. Walendowski, Activity-based serendipitous recommendations with the Magitti mobile leisure guide, in: CHI 2008 Proceedings, 2008, pp. 1157–1166. [5] J.R. Kwapisz, G.M. Weiss, S.A. Moore, Activity recognition using cell phone accelerometers, in: 4th ACM SIGKDD International Workshop on Knowledge Discovery from Sensor Data, 2010. [6] M. Böhmer, G. Bauer, A. Krüger, Exploring the design space of context-aware recommender systems that suggest mobile applications, in: 2nd Workshop on Context-Aware Recommender Systems, 2010. [7] J. Paek, J. Kim, R. Govindan, Energy-efficient rate-adaptive GPS-based positioning for smartphones, in: Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services, 2010, pp. 299-314. [8] N. Ravi, J. Scott, L. Han, L. Iftode, Context-aware battery management for mobile phones, in: Proceedings of the Sixth Annual IEEE International Conference on Pervasive Computing and Communications, 2008, pp. 224–233. [9] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, Wiley, NJ, USA, 1973. [10] D. Yarowsky, Unsupervised word sense disambiguation rivaling supervised methods, in: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 1995, pp. 189–196. [11] A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998, pp. 92–100. [12] R. Jacobs, M. Jordan, S. Nowlan, G. Hinton, Adaptive mixture local experts, Neural Computation 3 (4) (1991) 79–87. [13] W. Wang, Z.-H. Zhou, Analyzing co-training style algorithms, Lecture Notes in Computer Science 4701 (2007) 454–465. [14] N. Győrbíró, Á. Fábián, G. Hományi, An activity recognition system for mobile phone, Mobile Networks and Applications 14 (1) (2009) 82–91. [15] M. Berchtold, M. Budde, D. Gordon, H. Schmidtke, M. Beigl, ActiServ: activity recognition service for mobile phones, in: Proceedings of the 2010 International Symposium on Wearable Computers, 2010, pp. 1–8. [16] Z. Zhao, Y. Chen, J. Liu, Z. Shen, M. Liu, Cross-people Mobile-phone Based Activity Recognition, IJCAI 2011, 2011, pp. 2545–2550. [17] Z. Zhao, Y. Chen, J. Liu, M. Liu, Cross-mobile ELM based activity recognition, International Journal of Engineering and Industries 1 (1) (2010) 30–40. [18] J. Frank, S. Mannor, D. Precu, Activity recognition with mobile phones, in: Proceedings of European Conference on Machine Learning 2011 Demo Session, 2011. [19] L. Sun, D. Zhang, B. Li, B. Guo, S. Li, Activity recognition on an accelerometer embedded mobile phone with varying positions and orientations, in: Proceedings of the 7th International Conference on Ubiquitous Intelligence and Computing, 2010, pp. 548–562. [20] T. Brezmes, J. Gorricho, J. Cotrina, Activity recognition from accelerometer data on a mobile phone, in: Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living, 2009, pp. 796–799. [21] G. Bieber, J. Voskamp, B. Urban, Activity recognition for everyday life on mobile phones, in: Proceedings of the 5th International on Conference Universal Access in Human-Computer Interaction, Part II: Intelligent and Ubiquitous Interaction Environments, 2009, pp. 289–296. [22] J. Yang, Toward physical activity diary: motion recognition using simple acceleration features with mobile phones, in: Proceedings of the 1st International Workshop on Interactive Multimedia for Consumer Electronics, 2009, pp. 1–9. [23] N. Ravi, N. Dandekar, P. Mysore, M.L. Littma, Activity recognition from accelerometer data, in: Proceedings of the 17th Conference on Innovative Applications of Artificial Intelligence, 2005, pp. 1541–1546. [24] L. Bao, S.S. Intille, Activity recognition from user-annotated acceleration data, Pervasive 2004 (2004) 1–17. [25] Y.-J. Hong, I.-J. Kim, S.C. Ahn, H.-G. Kim, Mobile health monitoring system based on activity recognition using accelerometer, Simulation Modelling Practice and Theory 18 (4) (2010) 446–455. [26] Y. Wang, J. Lin, M. Annavaram, Q.A. Jacobson, J. Hong, B. Krishnamachari, and N. Sadeh, A framework of energy efficient mobile sensing for automatic user state recognition, in: Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services, 2009, pp. 179–192. [27] T. Choudhury, S. Consolvo, B. Harrison, J. Hightower, A. LaMarca, L. LeGrand, A. Rahimi, A. Rea, G. Borriello, B. Hemingway, P.Pedja Klasnja, K. Koscher, J.A. Landay, J. Lester, D. Wyatt, D. Haehnel, The mobile sensing platform: an embedded activity recognition system, IEEE Pervasive Computing 7 (2) (2008) 32–41. [28] S. Consolvo, D.W. McDonald, T. Toscos, M.Y. Chen, J. Froehlich, B. Harrison, P. Klasnja, A. LaMarca, L. LeGrand, R. Libby, I. Smith, J.A. Landay, Activity sensing in the wild: a field trial of ubifit garden, in: Proceedings of the Twenty-sixth Annual SIGCHI Conference on Human Factors in Computing Systems, 2008, pp. 1797–1806. [29] E.D. Ubeyli, Wavelet/mixture of experts network structure for EEG signals classification, Expert Systems with Applications 34 (3) (2008) 1954–1962. [30] R. Ebrahimpour, E. Kabir, H. Esteky, M.R. Yousefi, View-independent face recognition with mixture of experts, Neurocomputing 71 (4–6) (2008) 1103–1107.

Y.-S. Lee, S.-B. Cho / Neurocomputing 126 (2014) 106–115

[31] L.I. Kuncheva, Clustering-and-selection model for classifier combination, in: Proceedings of the Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, vol. 1, 2000, pp. 185–188. [32] R. Ebrahimpour, E. Kabir, M.R. Yousefi, Face detection using mixture of MLP experts, Neural Processing Letters 26 (1) (2007) 69–82. [33] C.A.M. Lima, A.L.V. Coelho, F.J.V. Zuben, Hybridizing mixtures of experts with support vector machines: investigation into nonlinear dynamic systems identification, Information Sciences 177 (10) (2007) 2049–2074. [34] C.M. Bishop, M. Svensén, Bayesian hierarchical mixtures of experts, in: Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, 2003, pp. 57–64. [35] A.H.R. Ko, R. Sabourin, A.S. Britto Jr., From dynamic classifier selection to dynamic ensemble selection, Pattern Recognition 41 (5) (2008) 1718–1731. [36] C. Rosenberg, M. Hebert, H. Schneiderman, Semi-supervised self-training of object detection models, in: IEEE 7th Workshops on Application of Computer Vision, vol. 1, 2005, pp. 29–36. [37] K. Nigam, R. Ghani, Analyzing the effectiveness and applicability of cotraining, in: Proceedings of the 9th International Conference on Information and Knowledge Management, 2000, pp. 86–93.

Young-Seol Lee is a Ph.D. candidate in computer science at Yonsei University. His research interests include Bayesian networks and evolutionary algorithms for context-aware computing and intelligent agents. He received his M.S. degree in computer science from Yonsei University.

115

Sung-Bae Cho received the B.S. degree in computer science from Yonsei University, Seoul, Korea and the M.S. and Ph.D. degrees in computer science from Korea Advanced Institute of Science and Technology (KAIST), Taejeon, Korea. He was an Invited Researcher of Human Information Processing Research Laboratories at Advanced Telecommunications Research (ATR) Institute, Kyoto, Japan from 1993 to 1995, and a Visiting Scholar at University of New South Wales, Canberra, Australia in 1998. He was also a Visiting Professor at University of British Columbia, Vancouver, Canada from 2005 to 2006. Since 1995, he has been a Professor in the Department of Computer Science, Yonsei University. His research interests include neural networks, pattern recognition, intelligent man–machine interfaces, evolutionary computation, and artificial life. Dr. Cho was awarded outstanding paper prizes from the IEEE Korea Section in 1989 and 1992, and another one from the Korea Information Science Society in 1990. He was also the recipient of the Richard E. Merwin prize from the IEEE Computer Society in 1993. He was listed in Who's Who in Pattern Recognition from the International Association for Pattern Recognition in 1994, and received the best paper awards at International Conference on Soft Computing in 1996 and 1998. Also, he received the best paper award at World Automation Congress in 1998, and listed in Marquis Who's Who in Science and Engineering in 2000 and in Marquis Who's Who in the World in 2001. He is a Senior Member of IEEE and a Member of the Korea Information Science Society, INNS, the IEEE Computational Intelligence Society, and the IEEE Systems, Man, and Cybernetics Society.