ARTICLE IN PRESS
Neurocomputing 69 (2006) 1754–1759 www.elsevier.com/locate/neucom
Letters
Support vector machine interpretation$ A. Navia-Va´zquez, E. Parrado-Herna´ndez DTSC, Univ. Carlos III de Madrid, Avda Universidad 30, 28911-Legane´s, Madrid, Spain Received 25 October 2005; received in revised form 9 December 2005; accepted 13 December 2005 Available online 20 February 2006 Communicated by R.W. Newcomb
Abstract Decisions taken by support vector machines (SVM) are hard to interpret from a human perspective. We take advantage of a compact SVM solution previously developed, known as growing support vector classifier (GSVC), to provide interpretation to SVM decisions in terms of input space segmentation in Voronoi sections (determined by the prototypes extracted during the GSVC training method) plus rules built as a linear combination of input variables. We show by means of experiments on public domain datasets that the resulting interpretable machines have high fidelity, and an accuracy comparable to the SVM. r 2006 Elsevier B.V. All rights reserved. Keywords: Support vector machine; Interpretation; Linear rule; Voronoi; Split
1. Introduction Support vector machines (SVMs) have become a powerful tool for solving many machine learning and signal processing problems, with particularly competitive results when applied to classification problems. However, SVMs suffer from the ‘‘black box’’ effect, i.e. their internal mechanisms are not easy to interpret by humans: we know their answer to a certain input pattern but do not clearly understand the reasons behind that decision. For a better understanding, two main rule extraction methods from neural networks (NNs) have evolved: the ‘‘pedagogical’’ methods directly train an interpretable machine using artificially generated samples labelled by the NN in order to replicate with high fidelity the original NN boundary (e.g. [7]), and the ‘‘decompositional’’ methods, which operate directly with the internal elements of the resulting network, possibly replacing nonlinear elements with linearized or staircase versions until a simplified (interpretable) structure is obtained (e.g. [9]). Techniques that combine both approaches are known as ‘‘eclectic’’ (e.g. [1]). $ This work has been partially supported by Spain CICYT Grant TEC2005-04264/TCM. Corresponding author. Tel.: +34 916245977; fax: +34 916248749. E-mail address:
[email protected] (A. Navia-Va´zquez).
0925-2312/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2005.12.118
The degree of interpretability achieved is variable, ranging from solutions relying on (zero-order) symbolic rules of the form ‘‘IF xi 4ui AND xj 4uj AND ... THEN output ¼ +1’’ to more complex methods as those relying on ellipsoidal regions [5]. Some special attempts have also been made to extract rules from SVMs, the ‘‘pedagogical’’ approaches directly apply existing methods for NNs, do not exploit any particular structure of the SVM and therefore are of little interest since they do not provide a new contribution to SVM interpretation. Others are of the ‘‘eclectic’’ type [1,5], more interesting conceptually, but they rely on symbolic rules, and their fidelity lies below 92%. Developing ‘‘decompositional’’ methods for SVMs has the inconvenience of the large machine size usually obtained (specially in cases where a significant portion of training patterns become support vectors (SVs)), with the inherent difficulty of handling/analyzing such large structures. Some works also propose approximations where a clustering procedure is used in a previous stage, and then ellipsoidal regions are modelled [5]. However, this latter approach is difficult to interpret and has been shown to scale poorly [1]. We will take advantage here of a particular semiparametric prototype-based SVM implementation, already shown to yield much (one or two orders of magnitude) smaller machines than the standard SVM approach while
ARTICLE IN PRESS A. Navia-Va´zquez, E. Parrado-Herna´ndez / Neurocomputing 69 (2006) 1754–1759
preserving or even improving the performance. Therefore, this approach seems more suitable for decompositional analysis, because it is easier to deal with the reduced architecture than operating directly on the large number of SVs. We propose to build linear decision rules (easy to interpret in terms of the weights of the linear combination of inputs) applied in local Voronoi regions (defined by the prototypes obtained as a result of the particular semiparametric SVM training method used). A Voronoi region associated with a given feature or prototype is defined as the set of points that are closer to that feature than to any other. Therefore, in our particular case, every prototype defines a Voronoi region containing those points of the input space that are closer to that prototype. In the following section we describe the two methods we propose for linear rule extraction from SVMs. The first one (‘‘eclectic’’) starts with the structure provided by growing support vector classifier (GSVC), extracting as many rules as prototypes are found by GSVC, and then applies a region splitting procedure until a cross-validation error criterion is met. The second one (‘‘pedagogical’’) starts from scratch (without information about the SVM architecture) in the derivation of the local rules, and also stops by cross-validation error. The goal here is not to compete with the large amount of methods for symbolic rule extraction (benchmarking against all of them exceeds the scope of this paper), but to show that interpretation of SVMs is possible in terms of local linear rules, which admit easy statistical analysis while preserving accuracy and fidelity. In the experimental section, we benchmark the proposed methods using several well-known datasets. Finally, some conclusions and further work are presented.
2. Building linear rules from support vector machines For any binary classification problem defined by a set of labelled input patterns fxi ; yi gPi¼1 with xi 2 RN and yi 2 f1; þ1g, all from the same data distribution, the SVM approach first projects the input data into a high dimensional space, H, using a nonlinear projection, /ðÞ, where / has the property that inner products between projected vectors can be computed by means of a kernel function kðxi ; xj Þ ¼ /ðxi ÞT /ðxj Þ. Note that in some cases /ðxi Þ can be infinite-dimensional or even unknown (Gaussian kernel case). The procedure known as ‘‘kernel trick’’ in SVM literature states that we can translate any linear algorithm into a ‘‘kernelized’’ version whenever it can be formulated using only inner products, such that the ‘‘kernelized’’ versions only use /ðxi Þ in inner product operations, the latter directly computable using the abovementioned kernel function. In what follows, we restrict to the Gaussian kernel case kðxi ; xj Þ ¼ expðkxi xj k2 =ð2s2 ÞÞ. After projecting patterns to H, the SVM finds a maximal margin linear classifier, f ðxÞ ¼ signðwT /ðxÞÞ (for easier explanation we assume here a null bias), such that w
is the solution of ( ) P X 1 T xi min w w þ C w;xi 2 i¼1
1755
(1)
s.t. yi ðwT /ðxÞÞ 1 þ xi X0;
8i ¼ 1; . . . ; P,
ð2Þ
xi X0;
8i ¼ 1; . . . ; P,
ð3Þ
where xi are positive slack variables introduced to deal with non-separable problems and C is the penalty for patterns incorrectly classified or inside the margin (the SVs). As a result of solving the optimization problem in (1)–(3), usually by using Lagrange multipliers ai to incorporate the restrictions into the functional, converting it into a dual form, and applying a quadratic programming technique (the interested reader may refer to [8] for further details), we obtain the final SVM classifier: ! N SV X oj ðxj Þ ¼ sign ai yi kðvi ; xj Þ , (4) i¼1
where NSV is the number of SVs (vi ), and ai are the resulting nonzero Lagrange multipliers. Semiparametric SVMs [4,6] are a means to control the size of the classifier by introducing a predefined (or iteratively estimated) prototype-based model in the formulation of the SVM problem. This machine size control has also shown to benefit the generalization capabilities of the trained network. GSVC is a constructive method used to incrementally obtain the prototypes needed to determine the SVM structure [6], and then the structural risk minimization (SRM) principle is used to find the weights. As opposed to the standard nonparametric SVM methods, which finally obtain large machines like (4), GSVC gives a final classifier producing an output decision oj as response to input pattern xj : ! R X bi kðpi ; xj Þ , (5) oj ðxj Þ ¼ sign i¼1
where bi are the weights of the linear combination of R kernels, pi are the prototypes determining the SVM architecture, and usually R5NSV. We propose to take advantage of the semiparametric structure in (5), and use the prototypes pi for obtaining a partition of the input space in Voronoi sections, such that every pattern in the input space can easily be located in one of the regions using a minimal distance criterion. Linear boundaries are then obtained for every section by adjusting a linear hyperplane by logistic regression to patterns inside the region, the desired regression targets being the outputs of the SVM machine before the sign operation. Since we aim at replicating the SVM decision boundary with high fidelity, we resort to generating an artificial training set, by adding noise to the SVs and using the trained SVM to obtain the target values for these new patterns. Note that other learning mechanisms can also be applied (following the
ARTICLE IN PRESS A. Navia-Va´zquez, E. Parrado-Herna´ndez / Neurocomputing 69 (2006) 1754–1759
1756
pedagogical approach) on these noisy patterns to obtain other interpretable machines, here C4.5 has been used in the experimental section as baseline reference [1]. Once the linear boundaries have been found for every region, we may identify the region with the largest classification error with respect to the SVM targets and split it into two smaller regions, thereby refining the solution to obtain a more accurate piece-wise linear approximation to the desired boundary. We call this approach of using prototypes, pi , plus splitting, as the ‘‘SVMrule’’ method. The splitting procedure is illustrated in Fig. 1(a) and (b). Let us assume that four prototypes (black squares) exist at present time, defining Voronoi regions by minimal distance, and that linear boundaries have been adjusted in every region (marked with dashed lines in Fig. 1(a)), aiming at approximating the SVM boundary (plotted with a continuous line). In this case, the region corresponding to prototype numbered as ‘‘4’’ in Fig. 1 is the one with largest error when predicting the SVM labels, since the region corresponding to prototype number ‘‘4’’ has the largest mismatch between the original boundary and the linear approximation. Thus we split this region, by selecting two new candidate prototypes near the boundary, such that the distance between them is maximal (marked with gray circles in Fig. 1(a)), and then removing the old prototype number 4 from the prototype collection. These two new prototypes (labelled 4 and 5 in Fig. 1(b)) define two new Voronoi regions, and allow us to obtain new linear boundaries with higher fidelity with respect to the SVM boundary than in Fig. 1(a). This splitting procedure is iteratively applied and stopped when error in the validation set does not further decrease. An alternative to the ‘‘SVMrule’’ approach, where prototypes obtained with GSVC are used for initialization purposes, is to start from scratch, defining a first prototype as the sample mean of all training data, and then apply iteratively the splitting procedure until the error on the validation set does not further decrease. We will denote this
1
1 4 4
2
2 5 3
3 (a)
(b)
Fig. 1. Illustration of the splitting mechanism. Region corresponding to prototype 4 in (a) is selected for splitting; two new prototypes are defined to replace the old one, and the new Voronoi partition in (b) allows for a better approximation of the SVM boundary (continuous line) in terms of local linear rules (dashed line).
method as ‘‘SVMsplit’’. Both the methods can be summarized as follows: Step 0: Initialize the prototype collection either with pi obtained in (5) (‘‘SVMrule’’ case) or with a single prototype computed as the sample mean of the training patterns (‘‘SVMsplit’’ case). Step 1: For every region, adjust a linear boundary by performing a logistic regression on the SVM targets for the noisy training set, and evaluate the classification error with respect to the SVM labels. Step 2: Select the region with the largest classification error, and proceed with the splitting mechanism explained above. Step 3: Go back to Step 1 and repeat until the error on a separate validation dataset reaches a minimum or does not decrease any further. As a result we obtain a set of linear rules, each valid on the Voronoi region defined by the corresponding prototype, such that two steps have to be carried out for evaluating an input pattern xi : first, find the nearest prototype pj to xi to determine its corresponding region ‘‘j’’, second, evaluate a rule of the form ‘‘IF wTj xi 40 THEN oi ¼ þ1 ELSE oj ¼ 1’’). When interpreting the resulting machine after applying either the ‘‘SVMrule’’ or ‘‘SVMsplit’’ we obtain two useful pieces of information. Firstly, information about the group of data a given input pattern belongs to (the overall characteristics being summarized in the corresponding prototype, e.g., in a credit scoring application, we would know that a certain pattern has large ‘‘income’’ value, and low ‘‘age’’ value). Secondly, by analyzing the linear combination coefficients in every group of data, we are able to identify which are the most relevant variables (largest weights in absolute value, corresponding to largest model sensitivity concerning those input variables), and whether they influence positively or negatively in the final decision (e.g., in the same example, ‘‘income’’ could have a large positive weight in the final decision, and ‘‘age’’ a medium negative weight). It would also be possible to know which are the (critical) input variable combinations that lie exactly on the decision boundary, by analyzing the solutions to wTj x ¼ 0. Before continuing to the experimental section, we illustrate here the qualitative behavior of the linear rule extraction plus splitting mechanism when starting from scratch in a ‘‘circle-in-a-box’’ problem, in Fig. 2(a)–(d). Note how the SVM boundary (dashed line) is gradually approximated with increased fidelity as the number of rules increases (5 in Fig. 2(a), 9 rules in Fig. 2(b) and 15 rules in Fig. 2(c)). We have observed that combinations of linear rules produces much smoother boundaries than those obtained using zero-order mechanisms (symbolic rules), such as those in [1] (see the C4.5 boundary for this problem in Fig. 2(d)). Furthermore, in higher dimensional spaces the problem gets ill-conditioned, since a hypercube (region obtained with symbolic rules) in high dimension takes the form of a porcupine, i.e., the ratio between surface and
ARTICLE IN PRESS 1
1
0.5
0.5
x2
x2
A. Navia-Va´zquez, E. Parrado-Herna´ndez / Neurocomputing 69 (2006) 1754–1759
0
−0.5 −1 −1
−0.5
0.5
−1 −1
1
−0.5
0 x1
0.5
1
−0.5
0 x1
0.5
1
(b)
1
1
0.5
0.5
x2
x2
0 x1
0
−0.5
(c)
0
−0.5
(a)
−1 −1
1757
0
−0.5
−0.5
0 x1
0.5
−1 −1
1 (d)
Fig. 2. Increasingly accurate boundaries produced by the splitting mechanism in the ‘‘circle-in-a-box’’ problem: 5 rules (a), 9 rules (b), 15 rules (c) and C4.5 (d) are shown.
volume decreases, producing boundaries with poor generalization capabilities [2]: this is another reason for not using hypercubes for rule extraction. In the following section we will quantitatively benchmark the proposed methods on a variety of public domain datasets. 3. Experiments We evaluated the proposed algorithms on several datasets from the UCI repository (http://www.ics.uci.edu/ mlearn/MLSummary.html). We first apply both standard SVM (SVMlight [3]) and GSVC [6] to these problems, free parameters being adjusted by cross-validation. The performance achieved on these datasets, given in Table 1, shows that GSVC gives better or equal performance to SVMlight in 8 out of 13 datasets (winning cases have been highlighted using boldface). Average values are also favorable to GSVC, especially concerning the complexity of the resulting machines (107.3 vs. 894.7), which greatly favors the extraction of rules from the trained machines. In what follows we use the GSVC machines as a reference for the rule extraction process. We will evaluate the measures: Classification error (CE) and Fidelity (F). The CE is defined as the average number of errors between the correct labels and the predicted labels with any of the evaluated learning machines (SVMlight, GSVC, C4.5, SVMrule, SVMsplit). If oi are the predicted labels ðþ1; 1Þ, and yi are the optimal labels, then CE in
Table 1 Performance of SVMlight and GSVC in the UCI datasets N train
Dataset
N var
SVMlight CE
Twonorm Waveform Hand digit Landsat Ringnorm Spam Ripley Breast Heart Image Pima
5920 4000 3823 4435 5920 3680 250 466 202 1540 512
20 21 64 36 20 57 2 9 13 18 8
Average
2795.3
24.4
2.2 8.3 1.5 0.4 3.3 5.7 10.0 3.9 17.8 3.6 25.8 7.5
GSVC Size 3727 1478 801 94 3566 618 61 14 14 112 126 894.7
CE 2.1 7.9 2.1 0.25 1.3 6.5 9.6 4.7 11.8 2.3 24.1 6.6
Size 81 60 257 65 62 193 49 49 49 233 49 107.3
Number of training patterns (N train ), Number of input variables (N var ) Classification Error (CE), and Machine size (Size) is shown.
percentage is computed as CE ¼
P 100 X ðoi ayi Þ, P i¼1
(6)
where the operation ðoi ayi Þ gives +1 when oi is different from yi and zero when they are equal, and P is the total
ARTICLE IN PRESS A. Navia-Va´zquez, E. Parrado-Herna´ndez / Neurocomputing 69 (2006) 1754–1759
1758
number of patterns. Therefore, CE measures how well a given machine approximates the optimal solution to a given problem. Fidelity (F) measures how well a given machine learning method approximates the outputs of another one. In this case, the machine learning method that we try to replicate is a SVM (that provides reference outputs oi ), and the model that tries to replicate the behavior of the SVM is the proposed linear piecewise model, giving outputs o0i . Therefore F, in percentage, is computed as F¼
P 100 X ðoi ¼¼ o0i Þ, P i¼1
(7)
where the operation ðoi ¼¼ o0i Þ gives +1 when oi and o0i are equal and zero otherwise. For baseline comparison purposes, we have used C4.5 directly on the training data (results shown in the first column of Table 2), and also evaluated the result of SVM interpretation in terms of zero-order rules as in [1]. We generated an extended dataset by replicating the training points, adding noise to them, labelling them using the GSVC machine, and then applying C4.5. The results of C4.5 trained on these noise expanded datasets are collected in the second part of Table 2 (‘‘C4.5+noise’’ case). It can be observed that the classification error (CE) of basic C4.5 is much higher than GSVC (on average values, CE of 11.99 vs. 6.6), and average fidelity is not very high (around 89%). Extracting a tree from the noisy dataset seems to be better, since performance is improved in 8 out of 13 datasets (although the average CE is worse due to poor performance in the ‘‘Spam’’ case). Further, the number of rules obtained with this second C4.5 method is much larger (on average, 745.5 vs. 78.4), which severely reduces the interpretability of the resulting set of rules. The number of antecedents (N A ) is also shown.
Table 2 Performance of C4.5 and C4.5+noise rule extraction in the UCI datasets C4.5 CE Twonorm Waveform Hand digit Landsat Ringnorm Spam Ripley Breast Heart Image Pima
17.2 13.5 6.9 1.15 9.1 7.3 10.0 9.1 27.7 2.7 27.3
Average
11.99
C4.5+noise NR 251 143 82 17 169 107 4 9 19 34 28 78.4
NA 8.9 10 8.2 4.9 13.8 9.5 2.8 3.5 5.1 8.3 6.6 7.4
CE 11.8 13.2 4.1 0.75 6.5 36.4 10.0 5.5 29.7 3.25 24.2
NR 2662 1155 455 155 1434 1823 26 23 87 276 105
13.210 745.5
NA
F (%)
12.9 12.8 11.2 10.1 7.7 13.7 5.3 5.4 7.4 13.2 7.7
88.7 89.9 95.8 99.5 80.9 66.2 98.8 96.6 74.2 96.7 90.6
9.7
88.9
Classification Error (CE), number of rules (N R ), number of antecedents (N A ), and fidelity (F) is shown.
Table 3 Performance of SVMsplit and SVMrule in the UCI datasets Dataset
SVMsplit CE
Twonorm Waveform Hand digit Landsat Ringnorm Spam Ripley Breast Heart Image Pima Average
2.2 8.8 4.6 1.4 8.2 10.8 9.4 4.3 10.9 7.5 23.8 8.3
SVMrule NR
F (%)
CE
1 24 23 11 39 71 5 2 2 78 20
99.9 95.5 95.9 98.8 91.1 92.4 98.4 98.7 99.0 93.7 95.7
2.5 8.2 3.0 0.4 4.6 9.3 9.7 4.7 11.8 4.4 21.4
25.1
96.3
7.3
NR
F (%)
99 61 259 71 70 198 52 49 49 243 62
99.1 97.9 97.6 99.8 96.1 93.9 99.8 99.2 98.1 97.1 98.1
110.3
97.9
Classification Error (CE), number of rules (N R ), and fidelity (F) is shown.
The results for the proposed ‘‘SVMsplit’’ and ‘‘SVMrule’’ procedures are given in Table 3. The number of rules needed by the ‘‘SVMsplit’’ method is very low (25.1 on average) and, most importantly, in the easy datasets (almost linearly separable) ‘‘Twonorm’’, ‘‘Ripley’’, ‘‘Breast’’, ‘‘Heart’’, only 1–5 rules are needed to achieve very low CE, comparable with the GSVC’s results. In more complicated datasets, however, the ‘‘SVMrule’’ approach gives better results (because it benefits from the SVM structure, i.e., the prototype identification carried out by the GSVC procedure), at the cost of increased complexity in the number of rules (110.3 on average). Note that applying ‘‘SVMrule’’ directly on the standard SVM machine it would imply an average number of rules of about 894.7 (see Table 1 for relative complexities). In summary, ‘‘SVMrule’’ has an average CE of 7.3, much more favorable in comparison to GSVC (6.6) than the C4.5 methods (11.99 and 13.2), and even slightly superior to the CE obtained with the standard SVM (7.5). With respect to complexity, the proposed methods ‘‘SVMsplit’’ and ‘‘SVMrule’’ have on average a total number of rules (25.1 and 110.3, respectively) much smaller than the best performing C4.5 rule extraction method (745.54), and the number of sequential evaluations (N A ) also compares well, since in ‘‘SVMrule’’ and ‘‘SVMsplit’’ is always 2, while C4.5 rule extraction method needs on average 9.7 evaluations. 4. Conclusions and further work We have proposed two SVM interpretation methods that extract linear rules in local regions (Voronoi regions) of the input space. Both of them take advantage of a previously developed semiparametric Support Vector Machine training method called GSVC: the ‘‘SVMsplit’’ method starts from scratch and uses the improved labels provided by GSVC to learn local rules, and the ‘‘SVMrule’’ method
ARTICLE IN PRESS A. Navia-Va´zquez, E. Parrado-Herna´ndez / Neurocomputing 69 (2006) 1754–1759
starts with the prototypes selected during GSVC training. We have shown in several standard datasets that the performance of these methods is comparable to that obtained with the original SVMs and fidelity is around 97%. It also seems that they outperform approaches relying on zero-order rules, when benchmarked against a C4.5-based rule extraction method. We have not worked out any further the final interpretation of the extracted rules, since it is rather obvious to identify which is the contribution/effect/relevance of every input variable into the decision taken by the linear classifier in every local rule. As further work, we propose to study methods to parallelize and distribute the computations, since the noise expansion method may lead to very large training datasets, and to further simplify the obtained sets of rules for even easier interpretation. The possibility of applying this method for feature selection will also be investigated. References [1] N. Barakat, J. Diederich, Eclectic rule-extraction from support vector machines, Int. J. Comp. Intell. 2 (1) (2005) 59–62. [2] R. Hetch-Nielsen, Neurocomputing, Addison-Wesley, San Diego, 1991. [3] T. Joachims, Making large-scale SVM learning practical, in: B. Scho¨lkopf, C. Burges, A. Smola (Eds.), Advances in Kernel Methods—Support Vector Learning, MIT Press, Cambridge, CA, 1999, pp. 169–184. [4] A. Navia-Va´zquez, F. Pe´rez-Cruz, A. Arte´s-Rodrı´ guez, A. FigueirasVidal, Weighted least squares training of support vector classifiers leading to compact and adaptive schemes, IEEE Trans. Neural Networks 12 (5) (2001) 1047–1059. [5] H. Nu´n˜ez-Castro, C. Angulo-Baho´n, A. Catala´-Mallofre´, Rule based learning systems from SVM and RBFNN, Tendencias de la Minerı´ a de Datos en Espan˜a, Red Espan˜ola de Minerı´ a de Datos, vol. 1, 2004, pp. 13–24. (Available online in English at hhttp://www.lsi.us.es/redmidas/ Capitulos/LMD02.pdfi) [6] E. Parrado-Herna´ndez, J. Arenas-Garcı´ a, I. Mora-Jime´nez, A.R. Figueiras-Vidal, A. Navia-Va´zquez, Growing support vector
1759
classifiers with controlled complexity, Pattern Recogn. 36 (7) (2003) 1479–1488. [7] G.P.J. Schmitz, C. Aldrich, F.S. Gouws, ANN-DT: An algorithm for extraction of decision trees from artificial neural networks, IEEE Trans. Neural Networks 10 (3) (1999) 1392–1402. [8] B. Scho¨lkopf, A. Smola, Learning with Kernels, MIT Press, Cambridge, MA, 2002. [9] R. Setiono, W.K. Leow, J. Zurada, Extraction of rules from artificial neural networks for nonlinear regression, IEEE Trans. Neural Networks 13 (3) (2002) 564–577.
Angel Navia-Va´zquez received his degree in Telecommunications Engineering in 1992 (Universidad de Vigo, Spain), and finished his Ph.D., also in Telecommunications Engineering in 1997 (Universidad Polite´cnica de Madrid, Spain). He is now an Associate Professor at the Department of Signal Processing and Communications, Universidad Carlos III de Madrid, Spain. His research interests are focused on new architectures and algorithms for nonlinear processing, as well as their application to multimedia processing, communications, data mining, content management and E-learning. He has (co)authored 16 international refereed journal papers in these areas, several book chapters, more than 40 conference communications, and participated in more than 20 research projects. He is IEEE (Senior) Member since 1999 and Associate Editor of IEEE Transactions on Neural Networks since January 2004. Emilio Parrado-Hernandez got his Telecommunications Engineering degree in the University of Valladolid (Spain) in 1999 and his Ph.D. in Communication Technologies from University Carlos III of Madrid (Spain) in 2003. He is holding a Visiting Lecturer position in the Department of Signal Processing and Communications of University Carlos III, Spain. His main research interests include Machine Learning, specially Kernel Methods and its applications to signal and data processing. He has coauthored about 15 papers in this field.