Pattern Recognition Letters 37 (2014) 1–3
Contents lists available at ScienceDirect
Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec
Editorial
Partially supervised learning for pattern recognition This Special Issue (SI) originates from the first IAPR TC3 Workshop on Partially Supervised Learning (PSL2011) (Schwenker and Trentin, 2012) that we organized in Ulm, Germany, in September 2011, co-sponsored by the International Association for Pattern Recognition (IAPR) and its Technical Committee 3 on Neural Networks & Computational Intelligence. At that time, we felt the need to create a common ground for researchers active in (often quite diverse) areas that, broadly speaking, involve a ‘‘combination’’ of both supervised and unsupervised learning paradigms, not limited to semi-supervised learning (SSL) (Zhu et al., 2005) only. After the success of the workshop, and in the light of the exquisite scientific contributions given therein, we decided to propose a SI to a relevant journal, to be based on extended versions of selected papers from PSL2011. The Editorial Board of Pattern Recognition Letters (an official publication of the IAPR) accepted our proposal, encouraging us to proceed with the initiative. In doing so, we also decided to extend the scope of the SI to an even broader audience by means of an open call for papers on the topic of partially supervised learning (PSL) for pattern recognition. The call received an enthusiastic response by the community. As a consequence, this SI is the result of a selection of the best peer-reviewed submissions of both stems.
the art of the specific topic. Therefore, the reader is referred to this review article for a survey and categorization of the most prominent PSL approaches to date. 2. The Special Issue: Organization and list of papers In the following, we provide the reader with an ordered list of the papers that are comprised in this SI. The first paper of the special issue is a review article entitled Pattern Classification and clustering: a review of partially supervised learning approaches, where Friedhelm Schwenker and Edmondo Trentin discuss the state-of-the-art approaches to partially supervised learning, with special emphasis on pattern recognition and clustering involving partially labeled datasets. A concise overview of the major themes addressed by each paper is given as well. We organized the papers according to a taxonomy reflecting their nature in substantial compliance with the categorization of PSL approaches surveyed in the review article. This taxonomy encompasses, in the order: (1) semisupervised classification, (2) semi-supervised clustering, (3) active learning, (4) multi-view learning, (5) PSL in ensembles, and (6) PSL in artificial neural networks (ANN), support vector machines (SVM), and kernel machines.
1. What do we mean by ‘‘partially supervised learning’’? PSL is a research field of machine learning and pattern recognition which builds upon the broad areas of (i) learning, clustering, and/or estimating statistical classifiers from partially labeled data, and/or (ii) combining supervised and unsupervised learning including general statistical estimation techniques. More specifically, PSL embraces SSL (with a special emphasis on classification, regression, and clustering (Zhu et al., 2005; Basu et al., 2004)), diffusion learning (Gori et al., 2009; Zhou et al., 2004), active leaning (Settles, 2009), SSL in neural networks (Frasca et al., 2013) and deep architectures (Weston et al., 2008), SSL with vague, fuzzy, or uncertain teaching signals (Parthalain and Jensen, 2011; Yan et al., 2013), and SSL in multiple classifier systems and ensembles (Hady et al., 2010). PSL studies are particularly expected to involve the identification of scenarios for successful real-world applications. To state just a few examples, significant instances of PSL applications are rooted in image and signal processing (Du et al., 2013), multimodal information processing (Guillaumin et al., 2010), sensor/information fusion (Song et al., 2008), human computer interaction (Cohen et al., 2004), data mining and web mining (Chakrabarti et al., 2002), cheminformatics and bioinformatics (Qi et al., 2010). Whilst a survey of the area is beyond the scope of this Editorial, it is a policy of Pattern Recognition Letters to invite a review article to be included in SIs published by the journal, covering the state of 0167-8655/$ - see front matter Ó 2013 Published by Elsevier B.V. http://dx.doi.org/10.1016/j.patrec.2013.10.014
1. Semi-Supervised Classification: first, we introduce the contributions revolving around the topic having the utmost interest to the Journal, namely PSL techniques developed and applied within the framework of pattern classification. In their paper Unlabeling data can improve classification accuracy, Ludwig Lausser, Florian Schmid, Matthias Schmid, and Hans Kestler present stimulating experimental findings on the effects of setting different trade-offs between the fractions of labeled and unlabeled data in the training set used for developing a classifier. The investigation is conducted in the domain of bioinformatics and aims at the classification of DNA microarrays.The next paper, Semi-Supervised Linear Discriminant Analysis through Moment-Constraint Parameter Estimation by Marco Loog, introduces a variant of a semi-supervised linear discriminant function. The approach does not rely on arbitrary assumptions on the information mutually conveyed by labeled and unlabeled data, e.g. that a diffusion graph based on Euclidean distance captures the proper relationships among the data. It rather extrapolates constraints from the unlabeled sample which ‘‘regularize’’ the supervised estimation of the parameters of the classifier, leading to improved recognition accuracies over standard linear discriminants. In the article Transfer Learning with One-Class Data by Jixu Chen and Xiaoming Liu a transfer learning problem for binary classification is considered. The presented scenario assumes
2
Editorial / Pattern Recognition Letters 37 (2014) 1–3
that training data of only one class is available. For this, they propose a regression based learning algorithm and introduce a feature selection framework in order to select the most transferable discriminative features. Numerical experiments are carried out in two different applications: facial expression classification and landmark detection. Zebra Cataltepe, Abdullah Sonmez and Baris Senliol introduce in their paper on Feature Enrichment and Selection for Transductive Classification on Networked Data an algorithm for transductive network classification. They consider a graph based learning scheme to combine feature enrichment and feature selection into the overall classifier framework. A statistical evaluation is carried out for three different network databases. The paper A case study of linear classifiers adapted using imperfect labels derived from human event-related potentials by Timothy J. Zeyl and Tom Chau discusses how to utilize uncertain labels in pattern recognition tasks. The authors propose a learning algorithm that is able to adapt linear classifiers on imperfectly labeled data, so that both perfectly and imperfectly labeled examples are involved in the learning process. Their results show that using additional uncertain training data can improve the classification performance of trained models. Boosting for Multiclass Semi-Supervised Learning by Jafar Tanha, Maarten von Someren and Hamideh Afsarmanesh takles the problem of multiclass pattern recognition tasks in the context of semi-supervised learning. The authors introduce a multi class loss function consisting of margin costs and two regularization terms on the labeled and the unlabeled data, and so, they avoid conventional multiclass decomposition schemes, such as one-vs-rest or one-vs-one. They present a benchmark study using the UCI repository and text classification data sets. 2. Semi-Supervised Clustering: from classification we move to the other prominent paradigm of pattern recognition, namely clustering. Stefan Faußer and Friedhelm Schwenker, in their paper Semi-supervised clustering of large data sets with kernel methods, propose a novel meta-algorithm for kernel-based clustering of large, partially labeled datasets. The meta-algorithm, which proceeds by finding prototypes from the labeled data and then clusters the remaining data iteratively, reduces the overall computational complexity of the procedure w.r.t. standard methods and may be instantiated with a variety of different kernel-based clustering techniques. The next paper, Context-sensitive Intra-class Clustering by Yingwei Yu, Ricardo Gutierrez-Osuna, and Yoonsuck Choe, introduces a promising approach to intra-class clustering by means of a new SSL algorithm. The latter aims at partitioning the data such that the inter-class overlapping is minimized. The method is evaluated in classification tasks faced with linear and quadratic discriminant functions. In the paper A new Interactive Semi-Supervised Clustering model for large image database indexing the authors Lai Hien Phuong, Muriel Visani, Alain Boucher and Jean-Marc Ogier study the integration of prior expert knowledge into the clustering process incorporating pairwise constraints between images of the training database. They propose different strategies to implement this type of expert knowledge, and they evaluate their algorithms on three different image databases. Yuanli Pei and Xiaoli Z. Fern present in Constrained Instance Clustering in Multi-Instance Multi-Label learning a clustering algorithm in the context of multi-instance multi-label learning where bag labels are used as background knowledge to cluster instances. The authors propose to incorporate soft constraints on bag level in the clustering algorithm instead of pairwise constraints, and demonstrate this for a spectral clustering technique. They show promising results for their methods on artificial and real-world data sets.
In their paper Hierarchical Spatiotemporal Feature Extraction using Recurrent Online Clustering, S.R. Young, A. Davis, A. Mishtal and I. Arel introduce a deep learning architecture which is able to learn spatiotemporal patterns by means of a recurrent clustering algorithm. The units of the architecture are trained independently from each other instead of applying a layer-by-layer training as proposed in conventional deep learning schemes. The statistical evaluation of the proposed algorithm shows state-of-the-art classification accuracies on the MNIST benchmark database. 3. Active Learning: a supervised learning algorithm has control over the training set and selects the instances which are used for training by its own, so that a classifier can be more accurate with a smaller amount of labels than a passive learner. Jesús González-Rubio and Francisico Casacuberta introduce in their paper Cost-Sensitive Active Learning for Computer-Assisted Translation an active learning framework for computer-assisted translation. In contrast to many standard active learning algorithms, their learning model not only minimizes the number of translations but also takes into account the difficulty of each translation during the labeling process. Furthermore, they suggest to include a dynamic machine translation model into the translation system that continuously receives user feedback. Another application domain for active learning is considered in the paper Effective Balancing Error and User Effort in Interactive Handwriting Recognition by N. Serrano, J. Civera, A. Sanchis and A. Juan. In their work they present a combination of semisupervised learning and active learning for the transcription of handwritten documents. They propose to embed the learning algorithm into an interactive transcription system and prove the effectiveness of this scheme by means of a statistical evaluation on two handwritten documents. The combination of active and semi-supervised techniques is a promising direction for future theoretical research as well as for practical applications. 4. PSL in Multi-View Learning and Ensembles: different views (or, feature spaces) on the patterns may involve diverse subsets of either labeled or unlabeled training data. As a consequence, the individual views can complement each other in bringing in the information needed by the partially-supervised learner. The paper by Gaole Jin and Raviv Raich entitled Hinge Loss Bound Approach for Surrogate Supervision Multi-view Learning assumes a setup where a classifier is sought on a view whose data samples are all unlabeled, whereas other data samples are available for which (i) labels are available for other views, or (ii) both views are simultaneously present (still, the corresponding patterns are unlabeled). This ‘‘surrogate supervision’’ goal is pursued by means of a variant of SVM optimized under an upper bound for the traditional hinge loss function. The technique is applied to a lip reading task. Luca Didaci, Gian Luca Marcialis and Fabio Roli study in their paper Analysis of unsupervised template update in biometric recognition systems mono- and multi-modal biometric systems, and investigate the behavior of automatic self- and co-updating in this context. The experimental study on the DIEE Multimodal benchmark database shows that the classification performance of co-updating is superior to self-updating. Specific PSL machines have been proposed in the realm of ensembles, mixtures of experts, and multiple classifier systems, as well. Yann Soullard, Martin Saveski and Thierry Artires, in their paper Joint semi-supervised Learning of Hidden Conditional Random Fields and Hidden Markov Models, investigate an area that is mostly unexplored to date, namely SSL over sequential data. This is accomplished by means of a semi-supervised variant of traditional hidden Markov models, whose parameters are iteratively used for estimating the corresponding parameters of
Editorial / Pattern Recognition Letters 37 (2014) 1–3
a hidden conditional random field. In perspective, the approach pinpoints also a plausible direction for SSL over structured data. In Semi-Supervised Ensemble Update Strategies for On-line Classification of fMRI Data Catrin Olivier Plumpton introduces a random subspace learning method utilizing ensembles of linear base classifiers for the task of fMRI classification. She develops an online-learning algorithm for ensembles in a self-training scenario, for this the ensemble decision and the confidence value of this decision is incorporated into the learning. The algorithms are evaluated on two emotion-related fMRI data sets. 5. PSL in Artificial Neural Networks, Support Vector Machines and Kernel Machines: due to their utmost relevance to the fields of pattern recognition and machine learning, ANN, SVM, and kernel machines deserve a separate treatment within the framework of PSL. In Combination of Supervised and Unsupervised Learning for Training the Activation Functions of Neural Networks, Ilaria Castelli and Edmondo Trentin propose a connectionist architecture whose activation functions can be learned from the data according to a pair of co-trained models. The former, realized via standard supervised ANN, encapsulates the adaptive form of the activation function. The latter, based on the unsupervised maximum-likelihood estimation of the parameters of a Gaussian mixture model, yields a probabilistic credit to each neuron in the ANN (according to the specific input pattern). Recursive extensions (s.t. adaptive activation functions are modeled via ANNs which, in turn, include adaptive activation functions, and so on) lead to a novel family of deep architectures. In Laplacian Minimax Probability Machine, Kazuki Yoshiyama and Akito Sakurai exploit manifold regularization for introducing a semi-supervised version of the minimax probability machine. Theoretical achievements concerning the resulting machine are presented along with experimental comparisons with state-of-the-art graph-based SSL approaches. F. Mordelet and J.-P. Vert study in A bagging SVM to learn from positive and unlabeled examples the problem of one-class learning a SVM with unlabeled data from the perspective of inductive and transductive learning. Bootstrap aggregation is proposed to iteratively train an ensemble of binary classifiers in order to discriminate positive examples from random subsets of unlabeled data. By theoretical and experimental treatment the authors analyze their algorithms. Finally, in their paper entitled Unlabeled Patterns to Tighten Rademacher Complexity Error Bounds for Kernel Classifiers, Davide Anguita, Alessandro Ghio, Luca Oneto and Sandro Ridella study the effects induced by a partially supervised dataset on the upper bounds of the generalization error of a kernel machine. Taking into consideration unlabeled data in addition to the labeled patterns, the paper shows that the confidence term on the Rademacher complexity bound can be reduced down to one third of its default value. The result van be useful, for instance, for the aims and purposes of model selection.
3
We would express our gratitude to the Editor-in-Chief of Pattern Recognition Letters, Gabriella Sanniti di Baja, who supported our project and managed brilliantly and patiently the SI we are proud to have guest-edited. We are also thankful to the Elsevier Editorial Staff (in particular the Journal Manager, Janet Amali Joseph), and to the many Reviewers who contributed, timely and conscientiously, to the fulfillment of the present SI. Finally, we are grateful to all the authors of the papers embraced herein, who reacted promptly and enthusiastically to our editorial initiative with contributions of such a quality. References Basu, S., Bilenko, M., Mooney, R.J., 2004. A probabilistic framework for semisupervised clustering. In: Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04. ACM, pp. 59– 68. Chakrabarti, S., 2002. Mining the Web: Analysis of Hypertext and Semi Structured Data. Morgan Kaufman. Cohen, I., Cozman, F.G., Sebe, N., Cirelo, M.C., Huang, T.S., 2004. Semisupervised learning of classifiers: theory, algorithms, and their application to humancomputer interaction. IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (12), 1553–1567. Du, Y., Su, C., Cai, Z., Guan, X., 2013. Web page and image semi-supervised classification with heterogeneous information fusion. Journal of Information Science 39 (3), 289–306. Frasca, M., Bertoni, A., Re, M., Valentini, G., 2013. A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Networks 43, 84–98. Gori, M., 2009. Diffusion learning and regularization. In: Proc. of the 2009 Conference on New Directions in Neural Networks: 18th Italian Workshop on Neural Networks, WIRN 2008. IOS Press, pp. 127–137. Guillaumin, M., Verbeek, J., Schmid, C., 2010. Multimodal semi-supervised learning for image classification, in: IEEE Conference on Computer Vision & Pattern Recognition, CVPR. pp. 902–909. Hady, M.F.A., Schwenker, F., Palm, G., 2010. Semi-supervised learning for treestructured ensembles of RBF networks with co-training. Neural Networks 23 (4), 497–509. Parthalain, N.M., Jensen, R., 2011. Fuzzy-rough set based semi-supervised learning. In: IEEE International Conference on Fuzzy Systems. pp. 2465–2472. Qi, Y., Tastan, O., Carbonell, J.G., Klein-Seetharaman, J., Weston, J., 2010. Semisupervised multi-task learning for predicting interactions between HIV-1 and human proteins. Bioinformatics 26 (18), i645–i652. Schwenker, F., Trentin, E. (Eds.), 2012. Partially Supervised Learning – First IAPR TC3 Workshop, PSL 2011, Ulm, Germany, September 15–16, 2011. Revised Selected Papers, Vol. 7081 of Lecture Notes in Computer Science, Springer. Settles, B., 2009. Active learning literature survey. Tech. Rep. 1648, University of Wisconsin-Madison. http://pages.cs.wisc.edu/bsettles/active-learning. Song, Y., Zhang, C., 2008. Content-based information fusion for semi-supervised music genre classification. IEEE Transactions on Multimedia 10 (1), 145–152. Weston, J., Ratle, F., Collobert, R., 2008. Deep learning via semi-supervised embedding. In: Proc. of the 25th International Conference on Machine Learning, ICML ’08. ACM, New York, NY, USA, pp. 1168–1175. Yan, Y., Chen, L., Tjhi, W.-C., 2013. Fuzzy semi-supervised co-clustering for text documents. Fuzzy Sets and Systems 215, 74–89. Zhou, D., Schölkopf, B., 2004. A regularization framework for learning from graph data. In: ICML Workshop on Statistical Relational Learning. pp. 132–137. Zhu, X., 2005. Semi-supervised learning literature survey. Tech. Rep. 1530, Computer Sciences, University of Wisconsin-Madison.
Friedhelm Schwenker Institute of Neural Information Processing, Ulm University, 89069 Ulm, Germany E-mail address:
[email protected] URL: http://www.uni-ulm.de/en/in/institute-of-neural-informationprocessing/members/f-schwenker.html
Acknowledgments The PSL2011 Workshop that originated the idea of this special issue in the first place was co-sponsored by IAPR and its TC3, that are gratefully acknowledged. The partnership between the Universities of Ulm and Siena was partially granted under the Vigoni Program, managed by the Ateneo Italo-Tedesco/Deutsch-Italienisches Hochschulzentrum and co-funded by German DAAD and Italian MIUR.
Edmondo Trentin Dipartimento di Ingegneria dell’Informazione e Scienze Matematiche, Università di Siena, Via Roma 56, 53100 Siena, Italy E-mail address:
[email protected] URL: http://www.dii.unisi.it/~trentin Available online 24 October 2013