Active and adaptive ensemble learning for online activity recognition from data streams

Active and adaptive ensemble learning for online activity recognition from data streams

Accepted Manuscript Active and Adaptive Ensemble Learning for Online Activity Recognition from Data Streams Bartosz Krawczyk PII: DOI: Reference: S0...

613KB Sizes 2 Downloads 262 Views

Accepted Manuscript

Active and Adaptive Ensemble Learning for Online Activity Recognition from Data Streams Bartosz Krawczyk PII: DOI: Reference:

S0950-7051(17)30451-3 10.1016/j.knosys.2017.09.032 KNOSYS 4053

To appear in:

Knowledge-Based Systems

Received date: Revised date: Accepted date:

20 January 2017 25 September 2017 26 September 2017

Please cite this article as: Bartosz Krawczyk, Active and Adaptive Ensemble Learning for Online Activity Recognition from Data Streams, Knowledge-Based Systems (2017), doi: 10.1016/j.knosys.2017.09.032

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • Novel ensemble learning algorithm for online activity recognition • Adaptive and accurate mining of non-stationary sensor data streams • Adaptive classifier weight calculation scheme

AC

CE

PT

ED

M

AN US

• Active learning module to reduce the labelling cost

CR IP T

• One-vs-one decomposition with evolving classifiers for multi-class classification

1

ACCEPTED MANUSCRIPT

Active and Adaptive Ensemble Learning for Online Activity Recognition from Data Streams

a Department

CR IP T

Bartosz Krawczyka,∗ of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA

Abstract

PT

ED

M

AN US

Activity recognition is one of the emerging trends in the domain of mining ubiquitous environments. It assumes that we can recognize the current action undertaken by the monitored subject on the basis of outputs of a set of associated sensors. Often different combinations of smart devices are being used, thus creating an Internet of Things. Such data will arrive continuously during the operation time of sensors and require an online processing in order to keep a real-time track of the current activity being undertaken. This forms a natural data stream problem with the potential presence of changes in the arriving data. Therefore, we require an efficient online machine learning system that can offer high recognition rates and adapt to drifts and shifts in the stream. In this paper we propose an efficient and lightweight adaptive ensemble learning system for real-time activity recognition. We use a weighted modification of Naïve Bayes classifier that can swiftly adapt itself to the current state of the stream without a need for an external concept drift detector. To tackle the multi-class nature of activity recognition problem we propose to use an one-vs-one decomposition to form a committee of simpler and diverse learners. We introduce a novel weighted combination for one-vs-one decomposition that can adapt itself over time. Additionally, to limit the cost of supervision we propose to enhance our classification system with active learning paradigm to select only the most important objects for labeling and work under constrained budget. Experiments carried out on six data streams gathered from ubiquitous environments show that the proposed active and adaptive ensemble offer excellent classification accuracy with low requirement for access to true class labels.

AC

CE

Keywords: Data streams, Ensemble learning, One-vs-One, Active learning, Concept drift, Activity recognition

1. Introduction Nowadays we deal with the phenomenon of data flood. There is an ever-increasing amount of data to be generated, stored, transmitted [48] and processed [22]. Such mas∗ Corresponding

author Email address: [email protected] (Bartosz Krawczyk)

Preprint submitted to Knowledge-Based Systems

September 27, 2017

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

sive collections of data should be mined in order to discover useful knowledge, patterns and to offer value for academia, industry and society. However, such datasets are highly complex and require novel data mining and machine learning methods. This originates from the so-called 5V ’s of big data: V olume, V elocity, V ariety, V eracity and V alue. From the machine learning perspective the first two characteristics are especially interesting. Nowadays volume of data to be processed often exceeds the capabilities of standard computing systems [5]. Therefore, we require efficient computing platforms that are able to handle such massive datasets in an acceptable time [10]. However, these solutions are usually restricted to static data processing, while most in most of real-life applications data arrive continuously. Social media, video analytics, fault detection or healthcare systems are but a few examples of problems in which we need to tackle not only massive amounts of information, but also their velocity. Such a nature of incoming data is known as data streams [11] and require learning models that can swiftly adapt to incoming objects with limited consumption of computational resources [50]. Additionally, we assume that properties of a stream may change over time (with varying severity) and that the previously learned model may become obsolete. This is the so-called concept drift [17] and developing methods to tackle such non-stationary scenarios is currently of high interest to the machine learning community. Concept drift may originate from the nature of data itself, some corruption of the data generation source, or external conditions. However, from the perspective of intelligent system design we must adapt to any kind of shifts and drifts in incoming data, regardless of their nature. Among a plethora of real-life scenarios that generate data in form of streams sensor analysis is one of the most popular ones [33]. Sensor networks continuously monitor certain properties of environment or subject and transmit their measurements to the general unit for processing and analysis. Recently personal sensors have attracted significant attention, as they allow us to gain a deeper insight into variety of every-day activities and conditions in a non-invasive and convenient way. Body-worn sensors together with smartphones, smartwatches and other smart devices are able to generate massive amounts of data in real-time. When such a network of devices are connected together we obtain an ubiquitous environment called Internet of Things (IoT). It offers an access to valuable data that may allow to monitor population health, dangerous events or social dynamics. However, this requires efficient, adaptive and online algorithms in order to extract useful information [12]. In this paper we present a novel approach for activity recognition from data streams based on mining sensor networks. Detecting the current action of a human being is of high interest from various perspectives: one may use it for monitoring elderly and sick individuals, supervision of people with criminal background, or for optimizing smart houses performance to name a few [28]. We propose an active and adaptive ensemble learning method for online classification of activities from sensor readings arriving in a form of drifting data streams. Ensemble solutions are among the most popular algorithms for mining data streams [25]. To allow for an ongoing adaptation to changes in data we use our modification of Naïve Bayes classifier that automatically follows the changes in the stream by using instance weighting solution [26]. To increase its effectiveness for recognizing one of several activity classes we apply an ensemble paradigm by training a separate classifier on 3

ACCEPTED MANUSCRIPT

CR IP T

decomposed binary subproblems with one-vs-one method. This is further augmented by a dynamic weighted classifier combination that promotes those classifiers that are most competent during the current state of the stream. Finally, as we deal with growing collection of data from sensors, we cannot assume that we will have an access to true class labels for each arriving object. Therefore, we apply an active learning strategy for limiting the cost of supervision and selecting only important samples for labeling. We show that using the mentioned components to create an online ensemble leads to an adaptive and lightweight classification method that is able to display high efficacy in both accuracy and computational costs terms. The main contributions of this work are as follow:

M

AN US

1. A classification framework for activity recognition from non-stationary sensor data streams using adaptive version of Naïve Bayes classifier that do not require an explicit drift detector. 2. Empowering this classifier with ensemble learning based on one-vs-one multiclass decomposition that leads to simpler classifiers with reduced computational complexity. 3. Enhancing the decomposition scheme with an adaptive weighting approach for dynamic competence-based classifier combination. 4. An active learning strategy for label query with restricted budget that can displays increased labeling ratio when drift occurs. 5. Thorough experimental study on diverse real-life datasets proving high efficiency of the proposed approach in terms of predictive accuracy, time and memory requirements in comparison with the state-of-the-art methods.

PT

ED

The rest of this manuscript is organized as follows. The next section gives the necessary background on activity recognition from data streams. Section 3 presents in detail the proposed active and adaptive ensemble learning scheme, while experimental evaluations are to be found in Section 4. The final section provides conclusions and directions for future works. 2. Activity recognition from data streams

AC

CE

Data from various sensors, including accelerometers and gyroscopes built into smartphones and smartwatches, is being generated and transmitted continuously in form of data streams [20]. In order learn from such a stream one may select one of the two approaches: batch-based [54] or online-based [29]. In the first one we assume that objects will arrive in forms of chunks or batches and we must process them after such a set of examples becomes available. In online learning we process incoming objects one by one and adapt our system accordingly. Batch mode offers better robustness to local fluctuations of the stream and a broader outlook on the incoming objects, allowing us to train new classifiers on such chunks or using sliding windows for incremental learning and forgetting. Online mode forces us to modify our learner sample by sample, making it more vulnerable to noise or outliers, but at the same time allowing us for mining the stream on-the-fly and quickly adapting to appearing drifts. From the activity recognition point of view the online learning approach is a more suitable one, 4

ACCEPTED MANUSCRIPT

• process each object only once; • have limited computational and time requirements;

CR IP T

as we cannot allow for any delay in the decision making process. Actions must be properly recognized immediately after their initialization, especially in sensitive cases as assisted or elderly supervision [44] where any kind of delay may lead to longer dispatch time of medical assistance [47]. Therefore, one cannot wait for a batch of data to be collected and must classify incoming data and adapt the learning model in real-time. An online learner must exhibit three main characteristics [7]:

• its training procedure can be stopped at any time and the obtained accuracy should not be lower than one displayed by a model trained on identical data in batch mode.

CE

PT

ED

M

AN US

Online learners can employ additional mechanism for drift detection that will control their update ratio [17] or automatically adapt to the current state of the stream making the recent samples more important during the update process. Standard activity recognition approaches assume a static data scenario [34]. However, recently more and more attention is being paid to tackling this problem from online and streaming perspective. This approach was often known as real-time or continuous activity recognition. Pärkkä et al. proposed a dynamic classification approach based on data registration with PDAs and its processing with the usage of decision trees, where each base learner had its parameters optimized according to personal specifications of an user [36]. Wan et al. [49] introduced a real-time segmentation algorithm for sensor data that is based on time and sensor correlation. An application of feature extraction based on incremental deep learning with limited computational cost was discussed in [21]. Zhu and Sheng utilized Hidden Markov Model combined with Bayesian fusion of motion and local data for online classification of activities [56]. Ordóñez et al. [35] proposed to use evolving fuzzy classifiers eClass1 [2] for activity classification in both static and online modes. An online multi-task learner capable of automatic discovery of task relationships from real-world data was discussed in [43]. Recently Abdallah et al. [1] developed an online clustering-based classifier for automatic activity recognition from continuously changing data streams with limited supervision. The online active learning approach using bootstrapped classifiers was presented in [32]. 3. Active and adaptive ensemble

AC

In this section we will describe in details the proposed active and adaptive ensemble classifier. 3.1. General framework We propose a novel online ensemble scheme, combining principles of active and adaptive learning. As base learners we apply weighted modification of Naïve Bayes that can deal with online data and continuously adapt to the ongoing changes in the data stream. However, these classifiers tend to work more efficiently with smaller number 5

ACCEPTED MANUSCRIPT

PT

ED

M

AN US

CR IP T

of classes . Additionally, we must take into the consideration a fact that concept drift may not affect the entire decision space, but only some part of it. Therefore, such local drifts may be limited only to a subset of classes, while characteristics of the remaining ones are unaffected. To cope with this problem and offer an improved discriminative power we propose to combine these adaptive classifiers into an ensemble. Multiple classifier approaches [53] are considered as a highly efficient tool for mining data streams [45; 51]. We use the divide-and-conquer principle realized as a multi-class decomposition [8]. This allows us to form an ensemble in which each base classifier is tackling only a binary task, significantly reducing both the computational complexity and the learning process difficulty. In order to reconstruct the original multi-class task we use a one-vs-one (OVO) combination scheme based on weighted voting. We extend this approach by proposing a dynamic weighting scheme that recalculates the weights for each processed object. Finally, we embed an active learning approach into the proposed ensemble scheme to made it applicable to real-life scenarios where access to true class labels is highly limited. We aim at selecting only those objects that will be of value to the ensemble updating procedure. Classifier adaptation and weight recalculation is done only for these labeled objects, thus making the entire online learning procedure cost-efficient. We use an adaptive active learning strategy, contrary to the ones used so far in active learning that were based on a fixed threshold, obtaining a much better label query procedure for non-stationary sensor data streams. The general framework for the proposed active and adaptive ensemble (AAE) is depicted in Algorithm 1. It showcases the general pipeline of the proposed system. Firstly, a new instance arrives from the sensor stream. It is classified by the existing ensemble with previously set weights. After prediction is being made, it is subject to the active learning module which decides whether a label should be obtained from an expert to update the learning system. If instance is decided to be of interest to the ensemble, a true class label is obtained at a cost of available budget. Then every binary classifier that uses this individual class is being updated with new instance. Finally, the correctness of each classifier’s prediction forms a basis for recalculating the weights of ensemble members.

AC

CE

3.2. Weighted Naïve Bayes classifier with forgetting Naïve Bayes classifier is a popular classifier for handling growing collections of data due to its simplicity, online nature and low time and memory complexity [39]. However, when dealing with data streams one needs to take into consideration a possibility of non-stationary nature of incoming data. Here one cannot simply incrementally process incoming instances, as reacting to the presence of concept drift becomes a crucial factor influencing the performance of classifier. Naïve Bayes classifier has no ability to deal with drifting and shifting data, thus loosing effectiveness in such a scenario. Therefore, one may either augment it with an external concept drift detector or modify the learning procedure to automatically adapt to changes. In this paper we use our modification of Naïve Bayes classifier that adds a weighting factor to each of the processed instances [26]. This allows us to control the level of influence a given instance has on the a posteriori probabilities of each class. More 6

ACCEPTED MANUSCRIPT

CR IP T

Algorithm 1: General framework for proposed online active and adaptive ensemble (AAE). input: budget B, classifier Ψ(a,b) for classes a and b, pool of OVO binary classifiers Π, set of classifier weights W active learning strategy S(object, parameters), classifier update strategy CU(classifier,object), weight update strategy WU(object, parameters)

AN US

labeling cost b ← 0 while end of stream = FALSE do obtain new object x from the stream classify x using Π with associated OVO weights W if b < B then if S(x, parameters) = TRUE then obtain label y of object x b ← b++ forall the class pairs (a,b) do if a = y OR b = y then  update respective classifier with CU Ψ(a,b) , (x, y)

M

recalculate weights for the ensemble with WU(x, y, W, Π)

CE

PT

ED

representative instances for each class should have assigned a higher weight that will influence their impact on decision boundaries being computed. One may easily translate this to the drifting data stream scenario. Most recent instances should have higher weights in order to adapt the classifier to the current state of the stream. Each incoming instance or batch of data will modify the weights and, in case of drift presence, will give a higher importance to the ones coming from the current concept. We call our method Weighted Naïve Bayes for Concept Drift (WNB-CD). Weighting aims at giving the highest importance to the most recently arrived instances. Thus for each instance arriving online in the current state of the data stream DS k we assign the maximum weights: ∀xi ∈DS k ωi = 1.

(1)

AC

By following only this weighting scheme, we will get a standard Naïve Bayes method as all of instances will have an identical weight. Additionally, by storing all of instances extracted from the stream we potentially infinitely enlarge our data set and the memory requirements, which is impractical and leads to a poor generalization ability of our classifier. We can solve this by removing unnecessary and outdated examples [19] that are of no relevance to the current state of the stream. It seems natural that the level of importance of instances decreases with the passage of time, especially

7

ACCEPTED MANUSCRIPT

M

AN US

CR IP T

in case of non-stationary environments (as the current characteristics of objects can differ significantly from those of previous iterations). That is why we augment WNB-CD model with forgetting scheme. The most straightforward way to achieve this is to discard objects for which their arrival time have exceeded a given time threshold. Yet in this way we suddenly lose all of information associated with them, while they still may be of some usage to the classifier (especially for gradual and incremental drifts). Therefore, modeling a smooth forgetting factor seems as a much more attractive idea. We propose to realize a forgetting mechanism by directly influencing the weights of instances that were assigned using Eq. 1. Weights associated with instances should be reduced to reflect their decreasing importance for the data stream. This should continue until weights will drop below a certain level, informing the learning procedure that these instances should be discarded. Although WNB-CD updates itself with incoming instances in an online manner, changing weights after each new instance would put to high computational load on the classifier. Therefore, we propose to modify the weights after a number of instances had been processed. We solve this by using a window of fixed size and recalculating the weights for each k-th set of instances. This allows our WNB-CD model to classify and update itself in an online mode and forget instances in a batch mode, achieving a balance between response time and computational complexity. With each update iteration, we penalize the objects from previous chunks by decreasing their associated weights following a given forgetting function. With this, we are able to constraint the influence of older instances on the process of calculating a posteriori probabilities for each class. We use a forgetting mechanism for weight decay implemented as the following sigmoidal function:

ED

ωik+1 = ωik 2 exp(−β(k + 1))(1 + exp(−β(k + 1))

(2)

AC

CE

PT

where ωik ∈ [0, 1] and β is responsible for the forgetting speed, being a positive integer of any selected value (the higher its value, the more rapid forgetting becomes). It has a direct influence on the adaptation abilities of the classifier and should be selected based on the specific application - if rapid or gradual changes are to be expected. To further reduce the complexity of our classifier, we propose to introduce a weight threshold ε. All objects with weights falling below such a threshold will be automatically discarded from the memory. This will speed-up the computation and reduce the memory requirements for our model. Let us now present how to extend the Naïve Bayes classifier with these weighting functions. Naïve Bayes is based on calculating the posterior probability of class cm for a given test instance with given f features as follows: Qf p(cm ) j=1 p(aj |cm ) ,  p(cm |a1 , a2 , · · · , af ) = P Qf M j=1 p(aj |cq ) q=1 p(cq )

where M stands for a number of classes in the given learning problem.

8

(3)

ACCEPTED MANUSCRIPT

To apply the adaptive instance weighting schemes from Eq. (1) and (2), we need to modify the calculations of the individual probabilities used in Eq. (3). We propose to calculate them by taking into considerations weights assigned to each instance: p(cm ) =

PN

I(cxi = cm )ωi , PN M + i=1 ωi i=1

(4)

CR IP T

1+

where N is the number of instances stored in the current state of our data stream DS, cxi is the class of the i-th training instance, and I(·) is the indicator function (assuming value of 1 if the condition is fulfilled, and value of 0 otherwise). The conditional probability of j-th feature (assuming, that we deal with nominal attributes) is given as: PN

I(aj = aij )I(cxi = cm )ωi , PN nj + i=1 I(aj = aij )ωi i=1

(5)

AN US

p(aj |cm ) =

1+

M

where nj is the number of possible values of j-th feature, and aij is the value of j-th feature for i-th training example. This allows us to compute a Naïve Bayes classifier for data streams with concept drift by allowing for smooth adaptation to changes. The significant advantage of this classifier lies in its adaptation capabilities, as it will automatically follow shifts and drifts in the stream, without a need for an explicit drift detector. This lifts the problem of proper drift detector selection [41] and limitations associated with these external algorithms for monitoring changes in incoming instances [55].

AC

CE

PT

ED

3.3. Weighted one-vs-one decomposition Decomposition is an interesting approach for handling multi-class problems that may lead to significant gains in classification performance at the cost of increased computational complexity. Here we divide the original task into a number of simplified subtasks, each characterized by a reduced number of classes. Binary decomposition is the most popular solution used here, based on a concept of transforming a multi-class task into a number of two-class problems. Two popular strategies in this domain are onevs-one (OVO) and one-vs-all (OVA). The former one creates an exhaustive pairwise combinations of all classes, i.e., for a M -class problem we will obtain M (M − 1)/2 classifiers. The latter one takes each class as the positive one and combines all other classes as the negative one, i.e., for a M -class problem we will obtain M classifiers. OVO produces simpler and locally specialized classifiers at the cost of increased ensemble size. OVA outputs much more compact ensembles, but trains each base learner on an imbalanced dataset by introducing an artificial imbalance ratio of 1:(M-1). Recent studies show the higher effectiveness of OVO solution [14] and thus we will use this approach. To reconstruct the original multi-class problem from binary outputs of base classifiers OVO uses specific combination schemes. For the proposed AAE we will use the weighted voting solution, which determines the class of object x as follows: X Class(x) = arg max wij sij (x), (6) i=1,··· ,M

9

1≤j6=i≤M

ACCEPTED MANUSCRIPT

where (7)

sij (x) = max {FΨ (x, i), FΨ (x, j)}

AN US

CR IP T

where FΨ ˆ (x, i) is a continuous output (support function) of a binary classifier for i-th class (FΨ ˆ (x, i) ∈ [0, 1]) and wij is weight assigned to this binary classifier (wij ∈ [0, 1]). The higher the weight, the bigger the importance of given classifier in the multiclass reconstruction phase. One can see that the established weights will have major impact on the classifier combination step. In static classification one may calculate these weights beforehand with the usage of an external validation set. However, in data stream mining the competences of classifiers may change over time and weighting function cannot be a static one. We propose to combine weighted OVO voting with a Winnow dynamic combiner [4] tailored for the data stream application. This will allow to dynamically adjust weights for classifiers, boosting the influence of those that are competent during the current state of the stream. The Dynamic OVO-Winnow (DOVO-Win) approach is depicted in Algorithm 2.

ED

M

Algorithm 2: Dynamic OVO-Winnow weight update strategy DOVOWin(x,y,a,W,Π) input: new object x, class label y, weight adjustment a(a ≥ 1), current set of weights W pool of OVO binary classifiers Π Result: set of weights W

PT

forall the classifiers in Π that correctly classified x do weight ← weight * a forall the classifiers in Π that misclassified x do weight ← weight / a store new weights in W

AC

CE

There are two main advantages of DOVO-Win approach for activity recognition from data streams: • It updates the weights only for labeled objects returned by an active learning strategy which allows for having more stable weights (not being changed after every single object) and reduced computational cost. • From the activity recognition point of view each action is being taken for a given period of time. Therefore, the proposed weighting scheme allows to quickly boost the importance of classifiers competent to recognize this specific action right after it has been detected by the active learning approach and maintain the high importance of these relevant classifiers over its duration. 10

ACCEPTED MANUSCRIPT

AN US

CR IP T

Here one should notice a potential drawback of using OVO approach - the possibility of taking into account non-competent classifiers during multi-class reconstruction phase. While testing new object, it will be presented to all of classifiers in the pool. However, some of them are trained on different classes than the one to which given instance belongs. This means that they do not have any possibility to properly dichotomize this object, while at the same time not being able to simply reject it. Therefore, during combination phase some of local decisions will be returned by such incompetent classifiers, thus leading to a potential degradation of the OVO accuracy. This issue has been discussed and addressed in static scenarios [15], but to the best of our knowledge has not yet been considered in data stream mining area. The main difficulty lies in prohibitive computational costs of existing methods to prune incompetent classifiers in OVO, so they cannot be directly applied to high-speed online learning. Furthermore, some of classes may be much more difficult to analyze than others, leading to a mix of simple and compound decision boundaries [13; 30]. Once again, solutions exist for static cases, but cannot be directly translated to non-stationary streaming scenarios. The proposed DOVO-Win alleviates partially this problem, by boosting best performing classifiers for the current state of the stream. 3.4. Active learning

AC

CE

PT

ED

M

Most of works in data streams domain assume that a true class label for each object is available immediately after processing it. This is far from being realistic and would impose prohibitive labeling costs. Therefore, we need a way to reduce the number of objects to be labeled and make queries only for important objects that can contribute to the adaptation of the learning system. Active learning has gained an attention of data stream mining community in recent years [27; 52; 57]. We must also take into consideration the fact that data stream changes with time. This limits the usability of active learning strategies for static scenarios, as they are based on fixed set of rules. Efficient labeling approach for non-stationary data streams must be able to save its budget for most valuable samples, i.e., ones appearing during the concept drift. We propose to use randomized variable uncertainty strategy (R-VAR-UN) [57]. ˆ decision expressed as its It is based on monitoring the certainty of AAE classifier Ψ support functions FΨ ˆ (x, j) for object x belonging to j-th class. It aims to label the least certain instances gathered from a stream. A time-variable threshold imposed on AAE’s certainty is being used. It adjusts itself depending on the incoming data to balance the budget use over time. Threshold is further modified by a random factor. This allows for labeling some of the examples to which AAE displays a high certainty in order not to miss any possible drifts that may appear in any part of the decision space. However, this happens at the expense of sacrificing some of uncertain instances. Thus this strategy is expected to perform worse for static periods, but adapt faster to occurring changes, making it highly suitable for activity recognition from non-stationary data streams. The details of this strategy are given in Algorithm 3. 4. Experimental study The following experimental study was designed to answer the following questions: 11

ACCEPTED MANUSCRIPT

AN US

η ← random multiplier ∈ N (1, δ) θrand ← θ × η if maxj∈M FΨ ˆ (x, j) < θrand then decrease the uncertainty threshold θ ← θ − s labeling ← TRUE else increase the uncertainty threshold θ ← θ + s labeling ← FALSE

CR IP T

Algorithm 3: Active learning strategy R-VAR-UN(x,θ,s,δ,Ψ∗ ) input: new object x, threshold θ, threshold adjustment s ∈ [0, 1], threshold random variance δ, ˆ trained AAE ensemble Ψ Result: labeling ∈ [TRUE, FALSE]

• is there an advantage of using the proposed adaptive version of Naïve Bayes as a base classifier over existing models.

M

• can the proposed AAE approach offer improved activity recognition accuracy in comparison to state-of-the-art single and ensemble learners;

ED

• does the proposed AAE have limited computational requirements that would allow for its real-life application in mining ubiquitous environments; • does the embedded active learning module allow for maintaining a high recognition accuracy while reducing the labeling cots.

PT

In following subsections we will present used datasets, detailed set-up of our experiment and discussion that will answer these questions. 4.1. Activity recognition datasets

AC

CE

For experimental purposes we have used six different datasets coming from two repositories. Home activity recognition [46] is a collection of datasets gathered from sensor networks installed in smart housing environments 1 . Three houses were examined, each with different setting and varying sensor nodes numbers. To get comparable datasets every setting was based on identical event-based sensors to measure the same activities taking place. To obtain a dataset suitable for classification we segmented the sensor readings with time interval slices, each of 60 seconds length. After the process of segmentation we have obtained three datasets, each for a separate house. Data from 1 https://sites.google.com/site/tim0306/datasets

12

ACCEPTED MANUSCRIPT

ED

M

AN US

CR IP T

House A consists of 33 120 instances, from House B of 17 280 instances, and from House C of 24 480 instances. There were 7 types of activities recorded: leaving, toileting, showering, sleeping, breakfast, dinner, and drinking. Localization data for person activity [23] is a dataset containing recordings of five people performing different activities 2 . Each person wore four sensors while performing the same scenario five times. Dataset contains 164 860 instances described by coordinates, where a single instance is a localization data gathered from one of the sensors. There were 11 types of activities recorded: walking, falling, lying down, lying, sitting down, sitting, standing up from lying, on all fours, sitting on the ground, standing up from sitting, standing up from sitting on the ground. Heterogeneity Activity Recognition [42] is a dataset containing the readings of two motion sensors commonly found in smartphones3 . Reading were recorded while users executed activities scripted in no specific order carrying smartwatches and smartphones. Nine users with eight smart devices were used as a source for data generation. There are six activities to be recognized. This specific dataset is characterized by a large-scale volume (over 43 000 000 of instances). Human Activity Recognition Using Smartphones [40] is a dataset collected from 30 people using Samsung Galaxy S II smartphone, attached to their waists4 . Its embedded accelerometer and gyroscope was used to capture 3-axial linear acceleration and 3-axial angular velocity with a rate of 50Hz. There are six activities to be recognized. This specific dataset is characterized by a large feature space size (for an activity recognition dataset) with 561 dimensions. To analyze these datasets we have used raw sensor data without any feature transformation to reduce the computational overhead of the proposed system. We treat them as non-stationary data streams to be processed in online mode. Therefore, we divide them into two parts: a small one for training the initial classification system and a larger one to be treated as an incoming stream. Summary of these benchmarks are given in Table 1. Table 1: Details of data stream benchmarks used in the experiments.

Training instances

Stream instances

Features

Classes

120 280 480 860 930 257 299

33 000 17 000 24 000 164 000 43 000 000 10 000

14 23 21 4 16 561

7 7 7 11 6 6

PT

Data set

AC

CE

House A House B House C Localization HAR HARUS

2 http://archive.ics.uci.edu/ml/datasets/Localization+Data+for+Person+ Activity 3 https://archive.ics.uci.edu/ml/datasets/Heterogeneity+Activity+ Recognition 4 https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+ using+smartphones

13

ACCEPTED MANUSCRIPT

4.2. Set-up

Table 2: Details of used classifiers.

CR IP T

As reference methods for the proposed AAE we have selected six popular online classifiers for mining drifting data streams, some of which are also commonly used in the online activity recognition domain. Their details and used parameters are depicted in Table 2. The parameters for these methods were selected on the basis of authors previous works with adaptive and forgetting classifiers for data streams [26].

Model

Parameters

NB

Naïve Bayes

-

WNB-CD

Weighted Naïve Bayes for Concept Drift

forgetting factor β = 9 weight threshold ε = 0.1 weight update batch size = 100

eClass1

Evolving fuzzy classifier [2]

-

HTree

Hoeffding tree [37]

adaptive Naive Bayes leaf predictions grace period nmin = 100 split confidence δ = 0.01 tie-threshold τi = 0.05 drift detector = Early Drift Detection (EED)

LEV

Leveraging Bagging [3]

DWM

Dynamic Weighted Majority [24]

PT

ED

M

AN US

Abbr.

Active and Adaptive Ensemble

base classifier = HTree no. of classifiers in the pool = 30 base classifier = WNB-CD a = 1.10 θ = 0.75 s = 0.05

CE

AAE

base classifier = HTree no. of classifiers in the pool = 30

AC

These classifiers are evaluated in an online test-then-train mode, meaning that each incoming sample is firstly being classified and then used for updating the learner. As we deal with imbalanced multi-class datasets we propose to use the average accuracy metric. It gives the same weight to each class, obtains the accuracy rate of each class independently, and then averages these values to obtain the final result. The average accuracy is computed as follows: AveAcc =

M 1 X TRPi , M i=1

14

(8)

ACCEPTED MANUSCRIPT

AN US

CR IP T

where M is the number of classes and TRPi stands for the True Positive Rate of the i-th class. We apply this metric in a prequential setting [16]. We compute the average accuracy only over the most recent examples, instead on an entire stream, to allow for forgetting of outdated samples. Considered datasets are discretized beforehand using Fayyad and Irani’s MDLbased scheme [9], as proper feature extraction and discretization is beneficial for Bayesianbased classifiers [6]. This was done in a static way over the entire dataset. If an online discretization would be needed, one may easily switch this component into one of online discretizers discussed in our other work [38]. To allow for an statistical analysis of results we use Wilcoxon signed-rank test [18] as a non-parametric statistical procedure to perform pairwise comparisons between the proposed AAE method and reference classifiers. We assume the significance level α = 0.05. The experiments were performed on a machine equipped with an Intel Core i74700MQ Haswell @ 2.40 GHz processor and 24.00 GB of RAM in R environment. 4.3. Results

ED

M

We have evaluated the proposed methods in two settings - a fully labeled data stream and with active learning bounded by a limited budget. The overall classification accuracies for fully labeled streams are presented in Table 4, while the detailed ones over the entire course of data stream processing are depicted in Figure 1. Tables 5 and 6 present the averaged classification times and memory consumption of the examined classifiers. Additionally, we present the influence of DOVO-Win combination and its improvements over standard OVO in Table 3. Outcomes of Wilcoxon statistical significance test are shown in Table 7. Figure 2 presents the dependency between obtained accuracies and size of the labeling budget in active learning scenario. Table 3: Comparison of prequential average accuracies returned by proposed AAE approach with standard OVO and DOVO-Win. WNB-CD used as a base classifier.

House A House B House C Localization HAR HARUS

AAE-OVO

AAE-DOVO-Win

71.32 69.92 73.28 85.14 90.78 83.82

74.64 71.39 75.70 88.06 91.48 85.19

AC

CE

PT

Data set

4.4. Discussion Let us now take a look on obtained results. Firstly we will analyze the scenario with a full access to class labels. On all of six examined datasets using the proposed DOVO-Win classifier combination scheme leads to significant improvements in terms of classification performance. This shows that competence of base classifiers in OVO for drifting data streams 15

NB

1000

WNB−CD

6000

eClass1

HTree

LEV

DWM

AAE

NB

11000 16000 21000 26000 31000

1000

WNB−CD

4000

DWM

13000

AAE

16000

AN US

HTree

13000

LEV

17000

DWM

90 85 80 75 65

ED eClass1

9000

70

M

prequential average accuracy[%]

95

80 75 70

prequential average accuracy[%]

65

WNB−CD

5000

AAE

NB

21000

4000

WNB−CD

28000

eClass1

56000

objects processed

HTree

84000

LEV

112000

DWM

AAE

144000

objects processed

(d) Localization

NB

WNB−CD

1M 5M 9M

eClass1

HTree

LEV

DWM

14M 19M 24M 29M 34M 39M objects processed

(e) HAR

AAE

90 80 70 60

70

75

80

85

90

prequential average accuracy[%]

CE

100

PT

(c) House C

100

LEV

10000

(b) House B

60

NB

1000

95

HTree

objects processed

(a) House A

prequential average accuracy[%]

eClass1

7000

objects processed

AC

CR IP T

70 65 55

60

prequential average accuracy[%]

75 70 65 60 55

prequential average accuracy[%]

80

ACCEPTED MANUSCRIPT

NB

1000

WNB−CD

3000

eClass1

5000

HTree

LEV

7000

DWM

AAE

9000

objects processed

(f) HARUS

Figure 1: Prequential average accuracies of examined 16 methods for four fully-labeled sensor data streams. Results presented over batches of data (1000 instances for House A/B/C and 4000 instances for Localization datasets).

NB

WNB−CD

eClass1

HTree

LEV

DWM

AAE

NB

WNB−CD

CR IP T

65 60 50

55

prequential average accuracy[%]

70 65 60 55

prequential average accuracy[%]

70

75

ACCEPTED MANUSCRIPT

eClass1

HTree

LEV

DWM

AAE

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

budget used

budget used

AN US

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

(b) House B

eClass1

HTree

LEV

DWM

80 75 70

prequential average accuracy[%]

65

M WNB−CD

ED

NB

60

70 65 60

prequential average accuracy[%]

85

(a) House A

AAE

NB

LEV

DWM

AAE

budget used

WNB−CD

HTree

LEV

DWM

AAE

70

prequential average accuracy[%] eClass1

65

85 80

NB

75

80

PT

(d) Localization

CE

90

HTree

budget used

75

prequential average accuracy[%]

eClass1

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

(c) House C

AC

WNB−CD

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

NB

WNB−CD

eClass1

HTree

LEV

DWM

AAE

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

budget used

budget used

(e) HAR

(f) HARUS

Figure 2: Dependencies between classification accuracy 17 and budget size for active learning approach for streaming activity recognition.

ACCEPTED MANUSCRIPT

Table 4: Overall prequential average accuracies of examined methods.

House A House B House C Localization HAR HARUS

NB

WNB-CD

eClass1

Htree

LEV

DWM

AAE

59.76 58.83 62.58 70.05 82.14 75.49

66.84 66.47 69.74 80.20 85.01 80.04

67.14 65.90 70.11 80.94 83.17 78.42

62.96 66.23 68.60 79.50 82.39 77.20

70.11 70.05 73.55 84.67 88.39 82.19

68.19 71.64 72.72 83.37 86.02 83.05

74.64 71.39 75.70 88.06 91.48 85.19

CR IP T

Data set

Table 5: Averaged classification time per 1000 instances in seconds [s].

NB

WNB-CD

eClass1

Htree

LEV

DWM

AAE

House A House B House C Localization HAR HARUS

0.08 0.07 0.10 0.11 0.15 0.78

0.10 0.10 0.13 0.14 0.22 0.90

0.88 0.85 1.12 1.68 3.95 10.75

0.25 0.23 0.24 0.33 0.74 1.22

2.96 2.88 3.02 4.06 8.07 13.09

0.53 0.49 0.57 0.98 2.03 3.67

0.35 0.33 0.39 0.54 0.98 1.86

AN US

Data set

Table 6: Averaged memory consumption in megabytes [MB].

3.76 3.08 2.39 9.01 7.43 10.47

8.33 7.66 8.49 14.78 12.02 15.98

eClass1

Htree

LEV

DWM

AAE

104.85 98.32 93.84 125.19 119.08 177.01

8.93 7.03 8.52 14.04 9.89 18.30

79.48 75.05 77.63 90.08 88.36 120.72

89.43 87.13 88.78 105.22 93.64 129.33

12.43 13.38 11.98 26.07 27.05 48.95

M

WNB-CD

PT

House A House B House C Localization HAR HARUS

NB

ED

Data set

Table 7: Wilcoxon signed-rank test for comparing AAE with reference approaches over all of used datasets. Symbol ”+” represents cases in which AAE was statistically significantly superior.

CE

Comparison

vs. WNB-CD

vs. eClass1

vs. Htree

vs. LEV

vs. DWM

+ 0.0000

+ 0.0000

+ 0.0000

+ 0.0000

+ 0.0314

+ 0.0248

AC

p-value

vs. NB

changes over time and this must be accommodated for during multi-class reconstruction phase. The proposed dynamic weighting allows to track these changes over time and adapt the ensemble accordingly. When taking into account the activity recognition accuracy we can easily see that the proposed AAE offers a significant improvement in five out of six datasets. Only for House B case we get a slightly lower accuracy than DWM. This allows to conclude that for multi-class activity recognition scenarios using a binary decomposition in one18

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

vs-one mode can positively improve the obtained recognition rates. One may conclude that the proposed dynamic classifier weighting based on current competence over data stream leads to promoting better adapted binary classifiers, which in turns benefits the entire ensemble. This should be especially useful in case of high number of classes that leads to creation of large pool of classifiers in the ensemble. This way we are able to dynamically modify the level of influence that base learners hold over the combined classifier. Additionally, as each binary ensemble updates itself automatically, then AAE is able to track both global and local drifts which also is reflected in the improved predictive accuracy. What is interesting to notice is that a single WNB-CD returns similar performance to much more complex eClass1 classifier, showing that it is an efficient learner on its own. Let us now analyze the computational costs behind compared stream mining methods. Here obviously standard Naïve Bayes classifier displays lowest requirements, due to its simplicity. It is important to notice that our modification WNB-CD only slightly increases the computational requirements, while at the same time leading to significantly better classification performance. Methods based on Hoeffding tree are also able to achieve a good balance between the computational complexity and offered accuracy. However, one can see that eClass1, despite being a single rule-based classifier, requires significant amount of computational resources to carry out the evolutionary tuning phase, making it less suitable for the considered task of online activity recognition from high-speed sensor data streams. The proposed AAE is characterized by higher computational load than single WNB-CD, as it needs to maintain a pool of classifiers and update the weights constantly. This is balanced by each base classifier being much simpler (due to OVO decomposition), so in the end the gain in accuracy outweighs a slight increase in memory consumption. Additionally, AAE is characterized by a reasonably fast classification time in comparison to reference methods, which allows it to operate in real-time without delaying predictions. This is of crucial importance to scenarios, in which we must quickly react to potentially dangerous activity taking place (e.g., a a fall being detected in elderly assisted living environment). All of these advantages (obtained accuracy rates, rapid classification phase and reasonable memory consumption) make AAE a suitable choice for online activity recognition is ubiquitous environments. Let us now take a look at Figure 1. From it we can see that AAE is able to stabilize itself very quickly in case of concept drift appearing in the data, having a higher recovery rate than remaining classifiers. This is especially visible in case of House C and Localization datasets, where the proposed ensemble is much more robust to occurring changes and displays a lower variance in the accuracy. This desirable property can be contributed to two key factors. First one is the usage of WNB-CD classifiers as base learners that can automatically adapt to changes. By using OVO decomposition we simplify their adaptation procedure (as each learner must now update reduced number of rules) and allow them to track local drifts (by updating only these classifiers that are connected with the drifting classes). Additionally, proposed weighted combination offer further gains in accuracy, as it boosts the influence of classifiers competent to classify currently undertaken action. Let us now analyze the scenario with active learning approach. Here, we take a realistic assumption that true class labels are not always available and requiring them for 19

ACCEPTED MANUSCRIPT

AN US

CR IP T

each new instance will impose prohibitive labeling costs, especially in case of sensor and ubiquitous data mining. Therefore, we have examined the performance of classifiers with the active learning strategy described in Section 3.4, having varying budget sizes that allow for labeling from 5% to 50% of objects. Figure 2 displays the relations between assumed budget and obtained accuracies. Obviously one does not get identical results as in case of a fully labeled stream. Here, one may see that Naïve Bayes classifier does not achieve satisfactory performance with limited access to class labels. WNB-CD and Hoeffding tree are able to output good performance even with small number of labels to work with. On the other hand eClass1 displays a significant drop in accuracy, which can be explained by having not enough labeled instances to efficiently evolve its fuzzy rules. AAE with active learning can produce very good recognition rates even in case of a small budget allowed (as little as 10%-15% of all instances). This can be explained by the fact that a single instance is used by a subset of classifiers in the ensemble (ones trained on this specific class that given instance originates from). It leads to an improved diversity among base classifiers, which is a desirable property in ensembles designed for mining drifting data streams [31]. Additionally, it shows a stable performance without high variations caused by varying budget sizes. This proves that the proposed algorithm can work well in real-life scenarios without imposing extensive supervision costs on human experts. 5. Conclusions and future directions

AC

CE

PT

ED

M

In this paper we have discussed the issue of activity recognition from data streams generated by sensors working in ubiquitous environments or IoT. Detecting current action undertaken by a monitored person is of crucial importance in various domains, e.g., smart housing, assisted living or person supervision. Such a classification system must be able to work in real-time, without generating any delays and having limited resource consumption. Additionally, access to true class labels is limited, as every query would involve a human expert and cannot be done continuously during the course of sensor monitoring. We have addressed these issues by introducing a novel Active and Adaptive Ensemble for online activity recognition from non-stationary data streams. The proposed architecture utilized weighted Naïve Bayes classifiers that were able to adapt to the incoming data and handle changes in the stream without a need for an explicit drift detector. To improve recognition accuracy and reduce the computational complexity of each base learner we proposed to train them using one-vs-one decomposition on binary subsets of classes. An adaptive weighted combination was used to reconstruct the original multi-class problem from two-class outputs. The proposed ensemble utilized an active learning module to reduce the labeling cost over real-time senor data streams. Experiments conducted on six real-life datasets proved the high usefulness of the proposed ensemble classifier. Not only was it able to return excellent activity recognition accuracies within acceptable time and memory constraints, but also easily stabilized with even small number of labeled examples. Future works on this method will include tests with other online and adaptive classifiers and further reducing its computational complexity to allow for mobile devicebased data mining. 20

ACCEPTED MANUSCRIPT

Acknowledgment This work was supported by the Polish National Science Center under the grant no. DEC-2013/09/B/ST6/02264.

CR IP T

References

[1] Zahraa Said Abdallah, Mohamed Medhat Gaber, Bala Srinivasan, and Shonali Krishnaswamy. Adaptive mobile activity recognition system with evolving data streams. Neurocomputing, 150:304–317, 2015. [2] Plamen P. Angelov and Xiaowei Zhou. Evolving fuzzy-rule-based classifiers from data streams. IEEE Trans. Fuzzy Systems, 16(6):1462–1475, 2008.

AN US

[3] Albert Bifet, Geoffrey Holmes, and Bernhard Pfahringer. Leveraging bagging for evolving data streams. In Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Barcelona, Spain, September 20-24, 2010, Proceedings, Part I, pages 135–150, 2010. [4] Avrim Blum. Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain. Machine Learning, 26(1):5–23, 1997.

M

[5] Alberto Cano, Carlos García-Martínez, and Sebastián Ventura. Extremely highdimensional optimization with mapreduce: Scaling functions and algorithm. Inf. Sci., 415:110–127, 2017.

ED

[6] Alberto Cano, Sebastián Ventura, and Krzysztof J. Cios. Multi-objective genetic programming for feature extraction and data visualization. Soft Comput., 21(8):2069–2089, 2017.

PT

[7] Pedro M. Domingos and Geoff Hulten. A general framework for mining massive data streams. Journal of Computational and Graphical Statistics, 12:945–949, 2003.

CE

[8] Mikel Elkano, Mikel Galar, José Antonio Sanz, and Humberto Bustince. Fuzzy rule-based classification systems for multi-class problems using binary decomposition strategies: On the influence of n-dimensional overlap functions in the fuzzy reasoning method. Inf. Sci., 332:94–114, 2016.

AC

[9] U.M. Fayyad and K.B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence. Chambéry, France, August 28 September 3, 1993, pages 1022–1029, 1993.

[10] Alberto Fernández, Sara del Río, Victoria López, Abdullah Bawakid, María José del Jesús, José Manuel Benítez, and Francisco Herrera. Big data with cloud computing: an insight on the computing environment, mapreduce, and programming frameworks. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 4(5):380–409, 2014. 21

ACCEPTED MANUSCRIPT

[11] Mohamed Medhat Gaber. Advances in data stream mining. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 2(1):79–85, 2012.

CR IP T

[12] Mohamed Medhat Gaber, João Gama, Shonali Krishnaswamy, João Bártolo Gomes, and Frederic T. Stahl. Data stream mining in ubiquitous environments: state-of-the-art and current directions. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 4(2):116–138, 2014. [13] Mikel Galar, Alberto Fernández, Edurne Barrenechea, and Francisco Herrera. Empowering difficult classes with a similarity-based aggregation in multi-class classification problems. Inf. Sci., 264:135–157, 2014.

AN US

[14] Mikel Galar, Alberto Fernández, Edurne Barrenechea, and Francisco Herrera. DRCW-OVO: distance-based relative competence weighting combination for one-vs-one strategy in multi-class problems. Pattern Recognition, 48(1):28–42, 2015.

[15] Mikel Galar, Alberto Fernández, Edurne Barrenechea Tartas, Humberto Bustince Sola, and Francisco Herrera. Dynamic classifier selection for one-vs-one strategy: Avoiding non-competent classifiers. Pattern Recognition, 46(12):3412– 3424, 2013. [16] João Gama, Raquel Sebastião, and Pedro Pereira Rodrigues. On evaluating stream learning algorithms. Machine Learning, 90(3):317–346, 2013.

M

[17] João Gama, Indre Zliobaite, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. A survey on concept drift adaptation. ACM Comput. Surv., 46(4):44:1–44:37, 2014.

ED

[18] Salvador García, Alberto Fernández, Julián Luengo, and Francisco Herrera. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci., 180(10):2044–2064, 2010.

PT

[19] I. P. Gent, I. Miguel, and N. C. A. Moore. An empirical study of learning and forgetting constraints. AI Communications, 25(2):191–208, 2012.

AC

CE

[20] João Bártolo Gomes, Shonali Krishnaswamy, Mohamed Medhat Gaber, Pedro A. C. Sousa, and Ernestina Menasalvas Ruiz. Mobile activity recognition using ubiquitous data stream mining. In Data Warehousing and Knowledge Discovery - 14th International Conference, DaWaK 2012, Vienna, Austria, September 3-6, 2012. Proceedings, pages 130–141, 2012. [21] Mahmudul Hasan and Amit K. Roy-Chowdhury. A continuous learning framework for activity recognition using deep hybrid feature models. IEEE Trans. Multimedia, 17(11):1909–1922, 2015. [22] Francisco Herrera. A tour on big data classification: Selected computational intelligence approaches. In 2015 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology (IFSAEUSFLAT-15), Gijón, Spain, June 30, 2015. 22

ACCEPTED MANUSCRIPT

[23] Bostjan Kaluza, Violeta Mirchevska, Erik Dovgan, Mitja Lustrek, and Matjaz Gams. An agent-based approach to care in independent living. In Ambient Intelligence - First International Joint Conference, AmI 2010, Malaga, Spain, November 10-12, 2010. Proceedings, pages 177–186, 2010.

CR IP T

[24] J. Zico Kolter and Marcus A. Maloof. Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research, 8:2755– 2790, 2007. [25] Bartosz Krawczyk, Leandro L. Minku, João Gama, Jerzy Stefanowski, and Michal Wozniak. Ensemble learning for data stream analysis: A survey. Information Fusion, 37:132–156, 2017.

AN US

[26] Bartosz Krawczyk and Michal Wo´zniak. Weighted naïve bayes classifier with forgetting for drifting data streams. In 2015 IEEE International Conference on Systems, Man, and Cybernetics, Kowloon Tong, Hong Kong, October 9-12, 2015, pages 2147–2152, 2015. [27] Bartosz Kurlej and Michał Wo´zniak. Active learning approach to concept drift problem. Logic Journal of the IGPL, 20(3):550–559, 2012.

M

[28] Oscar D. Lara and Miguel A. Labrador. A survey on human activity recognition using wearable sensors. IEEE Communications Surveys and Tutorials, 15(3):1192–1209, 2013. [29] Mark Last. Online classification of nonstationary data streams. Intell. Data Anal., 6(2):129–147, 2002.

ED

[30] Gabriella Melki, Alberto Cano, Vojislav Kecman, and Sebastián Ventura. Multitarget support vector regression via correlation regressor chains. Inf. Sci., 415:53– 69, 2017.

PT

[31] Leandro L. Minku, Allan P. White, and Xin Yao. The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng., 22(5):730–742, 2010.

AC

CE

[32] Tudor Miu, Paolo Missier, and Thomas Plötz. Bootstrapping personalised human activity recognition models using online active learning. In 15th IEEE International Conference on Computer and Information Technology, CIT 2015; 14th IEEE International Conference on Ubiquitous Computing and Communications, IUCC 2015; 13th IEEE International Conference on Dependable, Autonomic and Secure Computing, DASC 2015; 13th IEEE International Conference on Pervasive Intelligence and Computing, PICom 2015, Liverpool, United Kingdom, October 26-28, 2015, pages 1138–1147, 2015. [33] Silvia Nittel. Real-time sensor data streams. SIGSPATIAL Special, 7(2):22–28, 2015.

23

ACCEPTED MANUSCRIPT

[34] Francisco Javier Ordóñez, Gwenn Englebienne, Paula de Toledo, Tim van Kasteren, Araceli Sanchis, and Ben J. A. Kröse. In-home activity recognition: Bayesian inference for hidden markov models. IEEE Pervasive Computing, 13(3):67–75, 2014.

CR IP T

[35] Francisco Javier Ordóñez, José Antonio Iglesias, Paula de Toledo, Agapito Ledezma, and Araceli Sanchis. Online activity recognition using evolving classifiers. Expert Syst. Appl., 40(4):1248–1255, 2013.

[36] Juha Pärkkä, Luc Cluitmans, and Miikka Ermes. Personalization algorithm for real-time activity recognition using pda, wireless motion bands, and binary decision tree. IEEE Trans. Information Technology in Biomedicine, 14(5):1211–1215, 2010.

AN US

[37] Bernhard Pfahringer, Geoffrey Holmes, and Richard Kirkby. New options for hoeffding trees. In AI 2007: Advances in Artificial Intelligence, 20th Australian Joint Conference on Artificial Intelligence, Gold Coast, Australia, December 2-6, 2007, Proceedings, pages 90–99, 2007.

[38] Sergio Ramírez-Gallego, Bartosz Krawczyk, Salvador García, Michal Wozniak, and Francisco Herrera. A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing, 239:39–57, 2017.

M

[39] J. Ren, S.D. Lee, X. Chen, B. Kao, R. Cheng, and D. Cheung. Naive bayes classification of uncertain data. In Data Mining, 2009. ICDM ’09. Ninth IEEE International Conference on, pages 944–949, 2009.

ED

[40] Jorge Luis Reyes-Ortiz, Alessandro Ghio, Xavier Parra, Davide Anguita, Joan Cabestany, and Andreu Català. Human activity and motion disorder recognition: towards smarter interactive cognitive environments. In 21st European Symposium on Artificial Neural Networks, ESANN 2013, Bruges, Belgium, April 24-26, 2013, 2013.

PT

[41] Piotr Sobolewski and Michał Wo´zniak. Concept drift detection and model selection with simulated recurrence and ensembles of statistical detectors. Journal of Universal Computer Science, 19(4):462–483, 2013.

AC

CE

[42] Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind K. Dey, Tobias Sonne, and Mads Møller Jensen. Smart devices are different: Assessing and mitigating mobile sensing heterogeneities for activity recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, SenSys 2015, Seoul, South Korea, November 1-4, 2015, pages 127–140, 2015. [43] Xu Sun, Hisashi Kashima, and Naonori Ueda. Large-scale personalized human activity recognition using online multitask learning. IEEE Trans. Knowl. Data Eng., 25(11):2551–2563, 2013.

24

ACCEPTED MANUSCRIPT

[44] Sotiris K. Tasoulis, Charalampos N. Doukas, Vassilis P. Plagianakos, and Ilias Maglogiannis. Statistical data mining of streaming motion data for activity and fall recognition in assistive environments. Neurocomputing, 107:87–96, 2013.

CR IP T

[45] Bogdan Trawi´nski. Evolutionary fuzzy system ensemble approach to model real estate market based on data stream exploration. J. UCS, 19(4):539–562, 2013.

[46] T. L. M. van Kasteren, G. Englebienne, and B. J. A. Kröse. Activity Recognition in Pervasive Intelligent Environments, chapter Human Activity Recognition from Wireless Sensor Network Data: Benchmark and Software, pages 165–186. Atlantis Press, Paris, 2011.

AN US

[47] José Ramón Villar, Silvia González, Javier Sedano, Camelia Chira, and José M. Trejo-Gabriel-Galan. Improving human activity recognition and its application in early stroke diagnosis. Int. J. Neural Syst., 25(4), 2015.

[48] Krzysztof Walkowiak, Róz˙ a Go´scie´n, Mirosław Klinkowski, and Michał Wo´zniak. Optimization of multicast traffic in elastic optical networks with distanceadaptive transmission. IEEE Communications Letters, 18(12):2117–2120, 2014. [49] Jie Wan, Michael J. O’Grady, and Gregory M. P. O’Hare. Dynamic sensor event segmentation for real-time activity recognition in a smart home context. Personal and Ubiquitous Computing, 19(2):287–301, 2015.

M

[50] Michał Wo´zniak. A hybrid decision tree training method using data streams. Knowl. Inf. Syst., 29(2):335–347, 2011.

ED

[51] Michał Wo´zniak. Application of combined classifiers to data stream classification. In Computer Information Systems and Industrial Management - 12th IFIP TC8 International Conference, CISIM 2013, Krakow, Poland, September 25-27, 2013. Proceedings, pages 13–23, 2013.

PT

[52] Michał Wo´zniak, Bogusław Cyganek, Andrzej Kasprzak, Paweł Ksieniewicz, and Krzysztof Walkowiak. Active learning classifier for streaming data. In Hybrid Artificial Intelligent Systems - 11th International Conference, HAIS 2016, Seville, Spain, April 18-20, 2016, Proceedings, pages 186–197, 2016.

CE

[53] Michał Wo´zniak, Manuel Graña, and Emilio Corchado. A survey of multiple classifier systems as hybrid systems. Information Fusion, 16:3–17, 2014.

AC

[54] Michał Wo´zniak, Andrzej Kasprzak, and Piotr Cal. Weighted aging classifier ensemble for the incremental drifted data streams. In Flexible Query Answering Systems - 10th International Conference, FQAS 2013, Granada, Spain, September 18-20, 2013. Proceedings, pages 579–588, 2013. [55] Michał Wo´zniak, Pawel Ksieniewicz, Boguslaw Cyganek, and Krzysztof Walkowiak. Ensembles of heterogeneous concept drift detectors - experimental study. In Computer Information Systems and Industrial Management - 15th IFIP TC8 International Conference, CISIM 2016, Vilnius, Lithuania, September 14-16, 2016, Proceedings, pages 538–549, 2016. 25

ACCEPTED MANUSCRIPT

[56] Chun Zhu and Weihua Sheng. Motion- and location-based online human daily activity recognition. Pervasive and Mobile Computing, 7(2):256–269, 2011.

AC

CE

PT

ED

M

AN US

CR IP T

[57] Indre Zliobaite, Albert Bifet, Bernhard Pfahringer, and Geoffrey Holmes. Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learning Syst., 25(1):27–39, 2014.

26