ESANN2012

20th European Symposium on Artificial Neural Networks
Bruges, Belgium, April 25-26-27

[Electronic proceedings home page] [Electronic proceedings author index]

ESANN2012
Content of the proceedings

WARNING: you need Adobe Acrobat reader 7.0 or more to view the PDF files below



Theory and practice of adaptive input driven dynamical dystems


ES2012-6

Theory of Input Driven Dynamical Systems

Manjunath Gandhi, Tino Peter, Herbert Jaeger

Abstract
Most dynamic models of interest in machine learning, robotics, AI or cognitive science are nonautonomous and input-driven. In the last few years number of important innovations have occurred in mathemati- cal research on nonautonomous systems. In understanding the long term behavior of nonautonomous systems, the notion of an attractor is fun- damental. With a time varying input, it turns out that for a notion of an attractor to be useful, the attractor cannot a single subset, but must be conceived as a sequence of sets varying with time as well. The aim of this tutorial is to illuminate useful notions of attractors of nonautonomous systems, and also introduce some newly emerging concepts of dynamical systems theory which are particularly relevant for input driven systems.

Manuscript from author [PDF]

ES2012-142

Simple reservoirs with chain topology based on a single time-delay nonlinear node

José Manuel Gutiérrez, D. San-Martín, Silvia Ortin, Luis Pesquera

Abstract
A physical scheme based on a single nonlinear dynamical system with delayed feedback has been recently proposed for Reservoir Computing (RC) [1]. In this paper we present a computational implementation of this idea using a simple chain topology with properties derived from its physical counterpart (e.g. the reservoir is defined by two tunable parameters related to feedback- and input-strength terms). An application to time series prediction is described and a comparison with other standard reservoir computing methods is given.

Manuscript from author [PDF]

ES2012-175

Balancing of neural contributions for multi-modal hidden state association

Christian Emmerich, R. Felix Reinhart, Jochen J. Steil

Abstract
We generalize the formulation of associative reservoir computing networks to multiple input modalities and demonstrate applications in image and audio processing scenarios.Robust association with reservoir networks requires to cope with potential error amplification of output feedback dynamics and to handle differently sized input and output modalities. We propose a dendritic neuron model in combination with a modified reservoir regularization technique to address both issues.

Manuscript from author [PDF]

ES2012-45

Input-Output Hidden Markov Models for trees

Davide Bacciu, Alessio Micheli, Alessandro Sperduti

Abstract
The paper introduces an input-driven generative model for tree-structured data that extends the bottom-up hidden tree Markov model with non-homogenous transition and emission probabilities. The advantage of introducing an input-driven dynamics in structured-data processing is experimentally investigated. The results of this preliminary analysis suggest that input-driven models can capture more discriminative structural information than non-input-driven approaches.

Manuscript from author [PDF]

ES2012-89

Constructive Reservoir Computation with Output Feedbacks for Structured Domains

Claudio Gallicchio, Alessio Micheli, Giulio Visco

Abstract
We introduce a novel constructive algorithm which progressively builds the architecture of GraphESN, which generalizes Reservoir Computing to learning in graph domains. Exploiting output feedback signals in a forward fashion in such construction, allows us to introduce supervision in the reservoir encoding process. The potentiality of the proposed approach is experimentally assessed on real-world tasks from Toxicology.

Manuscript from author [PDF]

ES2012-123

Process Mining in Non-Stationary Environments

Phil Weber, Tino Peter, Behzad Bordbar

Abstract
Process Mining uses event logs to discover and analyse business processes, typically assumed to be static. However as businesses adapt to change, processes can be expected to change. Since one application of process mining is ensuring conformance to prescribed processes or rules, timely detection of change is important. We consider process mining in such non-stationary environments and show that using a probabilistic view of processes, timely and confident detection of change is possible.

Manuscript from author [PDF]

ES2012-189

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems

Tino Peter, Ali Rodan

Abstract
We investigate the relation between two quantitative measures characterizing short term memory in input driven dynamical systems, namely the short term memory capacity (MC) and the Fisher memory curve (FMC). We show that under some assumptions, the two quantities can be interpreted as squared `Mahanabolis' norms of images of the input vector under the system's dynamics and that even though MC and FMC map the memory structure of the system from two quite different perspectives, they can be linked by a close relation.

Manuscript from author [PDF]

[Back to Top]


Regression


ES2012-51

Supervised learning to tune simulated annealing for in silico protein structure prediction

Alejandro Marcos Alvarez, Francis Maes, Louis Wehenkel

Abstract
Simulated annealing is a widely used stochastic optimization algorithm whose efficiency essentially depends on the proposal distribution used to generate the next search state at each step. We propose to adapt this distribution to a family of parametric optimization problems by using supervised machine learning on a sample of search states derived from a set of typical runs of the algorithm over this family. We apply this idea in the context of in silico protein structure prediction.

Manuscript from author [PDF]

ES2012-61

Structural Risk Minimization and Rademacher Complexity for Regression

Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella

Abstract
The Structural Risk Minimization principle allows estimating the generalization ability of a learned hypothesis by measuring the complexity of the entire hypothesis class. Two of the most recent and effective complexity measures are the Rademacher Complexity and the Maximal Discrepancy, which have been applied to the derivation of generalization bounds for kernel classifiers. In this work, we extend their application to the regression framework.

Manuscript from author [PDF]

ES2012-159

Quantile regression with multilayer perceptrons.

joseph Rynkiewicz, Solohaja-Faniaha Dimby

Abstract
We consider nonlinear quantile regression involving multilayer perceptrons (MLP). In this paper we investigate the asymptotic behavior of quantile regression in a general framework. First by allowing possibly non-identifiable regression models like MLP's with redudant hidden units, then by relaxing the conditions on the density of the noise. We present an universal bound for the overfitting of such models under weak assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP quantile regression model. As an illustration, we use this theoretical result to propose and compare effective criteria to find the true architecture of quantile MLP regression model.

Manuscript from author [PDF]

ES2012-4

Posterior regularization and attribute assessment of under-determined linear mappings

Marc Strickert, Michael Seifert

Abstract
Linear mappings are omnipresent in data processing analysis ranging from regression to distance metric learning. The interpretation of coefficients from under-determined mappings raises an unexpected challenge when the original modeling goal does not impose regularization. Therefore, a general posterior regularization strategy is presented for inducing unique results, and additional sensitivity analysis enables attribute assessment for facilitating model interpretation. An application to infrared spectra reflects data smoothness and indicates improved generalization.

Manuscript from author [PDF]

ES2012-65

Effects of noise-reduction on neural function approximation

Frank-Florian Steege, Volker Stephan, Horst-Michael Groß

Abstract
Noise disturbance in training data prevents a good approximation of a function by neural networks. To achieve better approximation results we combine neural networks with noise reduction algorithms. We compare different methods to distinguish between samples with high noise level (outliers) in a dataset and samples with low noise level. Drawbacks of common outlier detection approaches are analysed and a new approach is defined which increases the quality of network function approximations. We demonstrate the effects of noise reduction on artificial datasets and on real data from the process control domain.

Manuscript from author [PDF]

ES2012-80

Learning geometric combinations of Gaussian kernels with alternating Quasi-Newton algorithm

David Picard, Nicolas Thome, Matthieu Cord, Alain Rakotomamonjy

Abstract
We propose a novel algorithm for learning a geometric combination of Gaussian kernel jointly with a SVM classifier. This problem is the product counterpart of MKL, with restriction to Gaussian kernels. Our algorithm finds a local solution by alternating a Quasi-Newton gradient descent over the kernels and a classical SVM solver over the instances. We show promising results on well known data sets which suggest the soundness of the approach.

Manuscript from author [PDF]

ES2012-177

Real time drunkenness analysis in a realistic car simulation

Audrey Robinel, Didier Puzenat

Abstract
This paper describes a blood alcohol content estimation method for car driver, based on a comportment analysis performed within a realistic simulation. An artificial neural network learns how to estimate subject's blood alcohol content. Low-level recording of user actions on the steering wheel and pedals are used to feed a multilayer perceptron, and a breathalyzer is used to build the learning examples set (desired output). Results are compared with a successful previous work based on a simple video game and demonstrate the ``complexity scalability'' of the approach.

Manuscript from author [PDF]

ES2012-173

Learning visuo-motor coordination for pointing without depth calculation

Ananda Freire, Andre Lemme, Jochen J. Steil, Guilherme Barreto

Abstract
Pointing refers to orienting a hand, arm, head or body towards an object and is possible without calculating the object's depth and 3D position. We show that pointing can be learned as holistic direct mapping from an object's pixel coordinates in the visual field to joint angles, which define pose and orientation of a human or robot. To this aim, we record real world and noisy training images together with corresponding robot pointing postures for the humanoid robot iCub. We then learn and comparatively evaluate pointing with an multi-layer perceptron, an extrem learning machine and a reservoir network, but also demonstrate that learning fails at reconstructing the depth of trained objects.

Manuscript from author [PDF]

[Back to Top]


Brain-computer interfaces


ES2012-130

BCI Signal Classification using a Riemannian-based kernel

Alexandre Barachant, Stephane Bonnet, Marco Congedo, Christian Jutten

Abstract
The use of spatial covariance matrix as feature is investigated for motor imagery EEG-based classification. A new kernel is derived by establishing a connection with the Riemannian geometry of symmetric positive definite matrices. Different kernels are tested, in combination with support vector machines, on a past BCI competition dataset. We demonstrate that this new approach outperforms significantly state of the art results without the need for spatial filtering.

Manuscript from author [PDF]

ES2012-18

One Class SVM and Canonical Correlation Analysis increase performance in a c-VEP based Brain-Computer Interface (BCI)

Martin Spüler, Wolfgang Rosenstiel, Martin Bogdan

Abstract
The goal of a Brain-Computer Interface (BCI) is to enable communication by pure brain activity without the need for muscle control. Recently BCIs based on code-modulated visual evoked potentials (c-VEPs) have shown great potential to establish high-performance communication. In this paper we present two new methods to improve classification in a c-VEP BCI. Canonical correlation analysis can be used to build an optimal spatial filter for detection of c-VEPs, while the use of a one class support vector machine (OCSVM) makes the BCI more robust in terms of artefacts and thus increases performance. We show both methods to increase performance in an offline analysis on data from 8 subjects. As a proof of concept both methods are tested online with one subject, who achieved an average performance of 133 bit/min, which is higher than any other bitrate reported so far for a non-invasive BCI.

Manuscript from author [PDF]

ES2012-40

Automatic selection of the number of spatial filters for motor-imagery BCI

Yuan Yang, Sylvain Chevallier, Joe Wiart, Isabelle BLOCH

Abstract
Common spatial pattern (CSP) is widely used for constructing spatial filters to extract features for motor-imagery-based BCI. One main parameter in CSP-based classification is the number of spatial filters used. An automatic method relying on Rayleigh quotient is presented to estimate its optimal value for each subject. Based on an existing dataset, we validate the contribution of the proposed method through a study of the effect of this parameter on the classification performance. The evaluation on testing data shows that the estimated subject-specific optimal values yield better performances than the recommended value in the literature.

Manuscript from author [PDF]

ES2012-46

The error-related potential and BCIs

Sandra Rousseau, Christian Jutten, Marco Congedo

Abstract
The error-related potential is an event-related potential triggered by errors. Recently it has been the subject of many attentions notably for its possible use in BCI systems. Since it is linked to error occurrence, it could be used in the design of control loop to build more robust systems. In this paper we studied the characteristics of the error potential and present how it could be used for BCI systems improvement.

Manuscript from author [PDF]

ES2012-140

Semi-Supervised Neural Gas for Adaptive Brain-Computer Interfaces

Hannes Riechmann, Andrea Finke

Abstract
Non-stationarity is inherent in EEG data. We propose a concept for an adaptive brain computer interface (BCI) that adapts a classifier to the changes in EEG data. It combines labeled and unlabeled data acquired during normal operation of the system. The classifier is based on Fuzzy Neural Gas (FNG), a prototype-based classifier. Based on four data sets we show that retraining the classifier significantly increases classification accuracy. Our approach smoothly adapts to the session-to-session variations in the data.

Manuscript from author [PDF]

[Back to Top]


Image and time series analysis


ES2012-47

Combined scattering for rotation invariant texture analysis

Laurent Sifre, Stéphane Mallat

Abstract
This paper introduces a combined scattering representation for texture classification, which is invariant to rotations and stable to de- formations. A combined scattering is computed with two nested cascades of wavelet transforms and complex modulus, along spatial and rotation variables. Results are compared with state-of-the-art algorithms, with a K-nearest neighbor classifier.

Manuscript from author [PDF]

ES2012-187

Hidden Markov models for time series of counts with excess zeros

Madalina Olteanu, James Ridgway

Abstract
Integer-valued time series are often modeled with Markov models or hidden Markov models (HMM). However, when the series represents count data it is often subject to excess zeros. In this case, usual distributions such as binomial or Poisson are unable to estimate the zero mass correctly. In order to overcome this issue, we introduce zero-inflated distributions in the hidden Markov model. The empirical results on simulated and real data show good convergence properties, while excess zeros are better estimated than with classical HMM.

Manuscript from author [PDF]

ES2012-171

Application of Dynamic Time Warping on Kalman Filtering Framework for Abnormal ECG Filtering

Mohammad Niknazar, Bertrand Rivet, Christian Jutten

Abstract
Existing nonlinear Bayesian filtering frameworks serve as an effective tool for the model-based filtering of noisy ECG recordings. However, since these methods are based on linear phase assumption, for some heart defects where abnormal waves only appear in certain cycles of the ECG, they are unable to simultaneously filter the normal and abnormal ECG segments. In this paper, a new method based on Dynamic Time Warping (DTW), which benefits information of all channels for nonlinear phase state calculation is presented. Results on real and synthetic data show that the new method can be successfully applied for filtering normal and abnormal ECG segments simultaneously.

Manuscript from author [PDF]

ES2012-124

texture classification based on symbolic data analysis

Carlos de Almeida, Renata Souza, Ana Lucia Candeias

Abstract
This article presents a hybrid approach for texture-based image classification using the gray-level co-occurrence matrices (GLCM) and a new Fuzzy Kohonen Clustering Network for Symbolic Interval Data (IFKCN). The GLCM matrices extracted from an image database are processed to create the training data set using IFKCN algorithm. The IFKCN organizes and extracts prototypes from processed GLCM matrices. The experimental results demonstrate that the proposed method is encouraging with an average successful rate of 97.39%.

Manuscript from author [PDF]

ES2012-160

Learning Object-Class Segmentation with Convolutional Neural Networks

Hannes Schulz, Sven Behnke

Abstract
After successes at image classification, segmentation is the next step towards image understanding for neural networks. We propose a convolutional network architecture that includes innovative elements, such as multiple output maps, suitable loss functions, supervised pretraining, multiscale inputs, reused outputs, and pairwise class location filters. Experiments on three data sets show that our method performs on par with current in computer vision methods with regards to accuracy and exceeds them in speed.

Manuscript from author [PDF]

ES2012-185

Incremental feature building and classification for image segmentation

Guillaume Bernard, Michel Verleysen, John Lee

Abstract
Image segmentation problems can be solved with classification algorithms. However, their use is limited to features derived from intensities of pixels or patches. Features such as contiguity of two regions cannot be considered without prior knowledge of one of the two class labels. Instead of stacking various classification algorithms, we describe an incremental scheme with a KNN classifier that works in a space where feature relevance is progressively updated. Feature relevance can smoothly vary from total ignorance to absolute certainty. Experiments on artificial images demonstrate the capabilities of this incremental scheme.

Manuscript from author [PDF]

[Back to Top]


Interpretable models in machine learning


ES2012-7

Making machine learning models interpretable

Alfredo Vellido, José D. Martín-Guerrero, Paulo Lisboa

Abstract
Data of different levels of complexity and of ever growing diversity of characteristics are the raw materials that machine learning practitioners try to model using their wide palette of methods and tools. The obtained models are meant to be a synthetic representation of the available, observed data that captures some of their intrinsic regularities or patterns. Therefore, the use of machine learning techniques for data analysis can be understood as a problem of pattern recognition or, more informally, of knowledge discovery and data mining. There exists a gap, though, between data modeling and knowledge extraction. Models, depending on the machine learning techniques employed, can be described in diverse ways but, in order to consider that some knowledge has been achieved from their description, we must take into account the human cognitive factor that any knowledge extraction process entails. These models as such can be rendered powerless unless they can be interpreted, and the process of human interpretation follows rules that go well beyond technical prowess. For this reason, interpretability is a paramount quality that machine learning methods should aim to achieve if they are to be applied in practice. This paper is a brief introduction to the special session on interpretable models in machine learning, organized as part of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. It includes a discussion on the several works accepted for the session, with an overview of the context of wider research on interpretability of machine learning models.

Manuscript from author [PDF]

ES2012-36

Interval coded scoring systems for survival analysis

Vanya Van Belle, Sabine Van Huffel, Johan Suykens, Stephen Boyd

Abstract
Black-box mathematical models are powerful tools in classi- fication and regression problems. Thanks to the use of (unknown) transformations of the inputs, the outcome can be estimated, improving performance in comparison to standard statistical models. A disadvantage of these complex models however, is their lack of interpretability. This work illustrates how advanced methods can be made interpretable. Using constant B-spline kernel functions and sparsity constraints, interval coded scoring models for survival analysis are presented.

Manuscript from author [PDF]

ES2012-99

Visualizing the quality of dimensionality reduction

Bassam Mokbel, Wouter Lueks, Andrej Gisbrecht, Michael Biehl, Barbara Hammer

Abstract
Many different evaluation measures for dimensionality reduction can be summarized based on the co-ranking framework [Lee and Verleysen, 2009]. Here, we extend this framework in two ways: (i) we show that the current parameterization of the quality shows unpredictable behavior, even in simple settings, and we propose a different parameterization which yields more intuitive results; (ii) we propose how to link the quality to point-wise quality measures which can directly be integrated into the visualization.

Manuscript from author [PDF]

ES2012-162

Unmixing Hyperspectral Images with Fuzzy Supervised Self-Organizing Maps

Thomas Villmann, Erzsebet Merenyi, William H. Farrand

Abstract
We propose a powerful alternative to customary linear spectral unmixing, with a new neural model, which achieves locally linear but globally non-linear unmixing. This enables unmixing with respect to a large number of endmembers, while traditional linear unmixing is limited to a handful of endmembers.

Manuscript from author [PDF]

ES2012-169

Constructing similarity networks using the Fisher information metric

Héctor Ruiz, Sandra Ortega, Ian Jarman, José D. Martín-Guerrero, Paulo Lisboa

Abstract
The Fisher information metric defines a Riemannian space where distances reflect similarity with respect to a given probability distribution. This metric can be used during the process of building a relational network, resulting in a structure that is informed about the similarity criterion. Furthermore, the relational nature of this network allows for an intuitive interpretation of the data through their location within the network and the way it relates to the most representative cases or prototypes.

Manuscript from author [PDF]

ES2012-28

extended visualization method for classification trees

José M. Martínez-Martínez, Pablo Escandell-Montero, Emilio Soria-Olivas, José D. Martín-Guerrero, Juan Gómez-Sanchis, Joan Vila-Francés

Abstract
Classification tree analysis is one of the main techniques used in Data Mining, and nowadays there is a lack of a visualization method that support this tool. Therefore, graphical procedures can be developed in order to help simplify interpretation and to obtain a better understanding. This paper proposes a method for representing the input data distribution for each class presented in each terminal node. For this purpose, the new visualization method Sectors on Sectors (SonS), proposed in [1], is used. The methodology is tested in two real data sets.

Manuscript from author [PDF]

ES2012-29

Cartogram representation of the batch-SOM magnification factor

Alessandra Tosi, Alfredo Vellido

Abstract
Model interpretability is a problem of knowledge extraction from the patterns found in raw data. One key source of knowledge is information visualization, which can help us to gain insights into a problem through graphical representations and metaphors. Nonlinear dimensionality reduction techniques can provide flexible visual insight, but the locally varying representation distortion they produce makes interpretation far from intuitive. In this paper, we define a cartogram method, based on techniques of geographic representation, that allows reintroducing this distortion, measured as a magnification factor, in the visual maps of the batch-SOM model. It does so while preserving the topological continuity of the representation.

Manuscript from author [PDF]

ES2012-57

Integration of Structural Expert Knowledge about Classes for Classification Using the Fuzzy Supervised Neural Gas

Marika Kästner, Wieland Hermann, Thomas Villmann

Abstract
In this paper we describe a methodology how structural expert knowledge about class relations can be integrated in classification schemes, if these models judge class dissimilarities using an unary class coding scheme. In particular, we suggest for those models to incorporate these informations into the class dissimilarity measure.

Manuscript from author [PDF]

ES2012-148

Similarity networks for heterogeneous data

Lluís Belanche, Jerónimo Hernández

Abstract
A two-layer neural network is developed in which the neuron model computes a user-defined similarity function between inputs and weights. The neuron model is formed by the composition of an adapted logistic function with the mean of the partial input-weight similarities. The model is capable of dealing directly with variables of potentially different nature (continuous, ordinal, categorical); there is also provision for missing values. The network is trained using a fast two-stage procedure and involves the setting of only one parameter. In our experiments, the network achieves slightly superior performance on a set of challenging problems with respect to both RBF nets and RBF-kernel SVMs.

Manuscript from author [PDF]

ES2012-167

Discriminant functional gene groups identification with machine learning and prior knowledge

Grzegorz Zycinski, Margherita Squillario, Annalisa Barla, Tiziana Sanavia, Alessandro Verri, Barbara Di Camillo

Abstract
In computational biology, the analysis of high-throughput data poses several issues on the reliability, reproducibility and interpretability of the results. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list interpretability is to integrate biological information from genomic databases in the learning process. Here we propose SVS, a machine learning based pipeline that incorporates domain biological knowledge a priori to structure the data matrix before the feature selection and classification phases. The pipeline is completed by a final step of semantic clustering and visualization. The clustering phase provides further interpretability of the results, allowing the identification of their biological meaning. To prove the efficacy of this procedure we analyzed a public dataset on prostate cancer.

Manuscript from author [PDF]

[Back to Top]


Machine ensembles: theory and applications


ES2012-9

An Exploration of Research Directions in Machine Ensemble Theory and Applications

Anibal Figueiras-Vidal, Lior Rokach

Abstract
A concise overview of the fundamentals and the main types of machine ensembles serves to propose a structured perspective for the papers that are included in this special session. The subsequent brief discussion of the works, emphasizing their principal contributions, permits an extraction of a series of suggestions for further research in the fruitful area of ensemble learning.

Manuscript from author [PDF]

ES2012-19

On the Independence of the Individual Predictions in Parallel Randomized Ensembles

Daniel Hernández-Lobato, Gonzalo Martínez-Muñoz, Alberto Suárez

Abstract
In randomized parallel ensembles the class label predictions for a particular instance by different ensemble classifiers are independent random variables. Taking advantage of this independence we design a statistical test to identify instances near the decision borders, which are difficult to classify because of their proximity to these borders. For these instances, the performance of the ensemble is poor and approaches random guessing. The validity of this analysis and the usefulness of the statistical test proposed are illustrated in several real-world classification problems.

Manuscript from author [PDF]

ES2012-121

Introducing diversity among the models of multi-label classification ensemble

Lena Chekina, Lior Rokach, Bracha Shapira

Abstract
A number of ensemble algorithms for solving multi-label classification problems have been proposed in recent years. Diversity among the base learners is known to be important for constructing a good ensemble. In this paper we define a method for introducing diversity among the base learners of one of the previously presented multi-label ensemble classifiers. An empirical comparison on 10 datasets demonstrates that model diversity leads to an improvement in prediction accuracy in 80% of the evaluated cases. Additionally, in most cases the proposed "diverse" ensemble method outperforms other multi-label ensembles as well.

Manuscript from author [PDF]

ES2012-141

Distributed learning via Diffusion adaptation with application to ensemble learning

Zaid Towfic, Jianshu Chen, Ali Sayed

Abstract
We examine the problem of learning a set of parameters from a distributed dataset. We assume the datasets are collected by agents over a distributed ad-hoc network, and that the communication of the actual raw data is prohibitive due to either privacy constraints or communication constraints. We propose a distributed algorithm for online learning that is proved to guarantee a bounded excess risk and the bound can be made arbitrary small for sufficiently small step-sizes. We apply our framework to the expert advice problem where nodes learn the weights for the trained experts distributively.

Manuscript from author [PDF]

ES2012-26

Regularized Committee of Extreme Learning Machine for Regression Problems

Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas, Josep Guimerá-Tomás, Marcelino Martínez-Sober, Antonio J. Serrano-López

Abstract
Extreme learning machine (ELM) is an efficient learning algorithm for single-hidden layer feedforward networks (SLFN). This paper proposes the combination of ELM networks using a regularized committee. Simulations on many real-world regression data sets have demonstrated that this algorithm generally outperforms the original ELM algorithm.

Manuscript from author [PDF]

ES2012-156

Linear kernel combination using boosting

Alexis Lechervy, Philippe-Henri Gosselin, Frédéric Precioso

Abstract
In this paper, we propose a novel algorithm to design multiclass kernels based on an iterative combination of weak kernels in a schema inspired from boosting framework. Our solution has a linear complexity in the number of training dataset size. We evaluate our method for classification first on a toy example by integrating our multi-class kernel into a kNN classifier and comparing our results with a reference iterative kernel design method, and then for image categorization by considering a classic image database and comparing our boosted linear kernel combination with the direct linear combination of all features in a linear SVM.

Manuscript from author [PDF]

ES2012-158

The stability of feature selection and class prediction from ensemble tree classifiers

Jérôme Paul, Michel Verleysen, Pierre Dupont

Abstract
The bootstrap aggregating procedure at the core of ensemble tree classifiers reduces, in most cases, the variance of such models while offering good generalization capabilities. The average predictive performance of those ensembles is known to improve up to a certain point while increasing the ensemble size. The present work studies this convergence in contrast to the stability of the class prediction and the variable selection performed while and after growing the ensemble. Experiments on several biomedical datasets, using random forests or bagging of decision trees, show that class prediction and, most notably, variable selection typically require orders of magnitude more trees to get stable.

Manuscript from author [PDF]

[Back to Top]


Bayesian and graphical models, optimization


ES2012-183

Sparse Nonparametric Topic Model for Transfer Learning

Ali Faisal, Jussi Gillberg, Jaakko Peltonen, Gayle Leen, Samuel Kaski

Abstract
Count data arises for example in bioinformatics or analysis of text documents represented as word count vectors. With several data sets available from related sources, like papers in related conference tracks, exploiting their similarities by transfer learning can improve models compared to modeling sources independently. We introduce a Bayesian generative transfer learning model which represents similarity across document collections by sparse sharing of latent topics controlled by an Indian Buffet Process. Unlike Hierarchical Dirichlet Process based multi-task learning, our model decouples topic sharing probability from topic strength, making sharing of low-strength topics easier, and outperforms the HDP approach in experiments.

Manuscript from author [PDF]

ES2012-94

Assessment of sequential Boltmann machines on a lexical processing task

Alberto Testolin, Alessandro Sperduti, Ivilin Stoianov, Marco Zorzi

Abstract
Recently, a promising probabilistic model based on Boltzmann Machines, i.e. the Recurrent Temporal RBM, has been proposed. It is able to learn physical dynamics (e.g. videos of bouncing balls), however up to now it was not clear whether this ability could apply to symbolic tasks. Here we assess its capabilities on learning graphotactic rules from a set of English words. It emerged that the model is able to extract local transition rules between items of a sequence, but it does not seem to be suited to encode a whole word.

Manuscript from author [PDF]

ES2012-165

Functional Mixture Discriminant Analysis with hidden process regression for curve classification

Faicel Chamroukhi, Hervé Glotin, Céline Rabouy

Abstract
We present a new mixture model-based discriminant analysis approach for functional data using a specific hidden process regression model. The approach allows for fitting flexible curve-models to each class of complex-shaped curves presenting regime changes. The model parameters are learned by maximizing the observed-data log-likelihood for each class by using a dedicated expectation-maximization (EM) algorithm. Comparisons on simulated data with alternative approaches show that the proposed approach provides better results.

Manuscript from author [PDF]

ES2012-95

An analysis of Gaussian-binary restricted Boltzmann machines for natural images

Nan Wang, Jan Melchior, Laurenz Wiskott

Abstract
A Gaussian-binary restricted Boltzmann machine is a widely used energy-based model for continuous data distributions, although many authors reported difficulties in training on natural images. To clarify the model's capabilities and limitations we derive a rewritten formula of the probability density function as a linear superposition of Gaussians. Based on this formula we show how Gaussian-binary RBMs learn natural image statistics. However the probability density function is not a good representation of the data distribution.

Manuscript from author [PDF]

ES2012-27

learning task relatedness via dirichlet process priors for linear regression models

Marcel Hermkes, Nicolas Kuehn, Carsten Riggelsen

Abstract
In this paper we present a hierarchical model of linear regression functions in the context of multi-task learning. The parameters of the linear model are coupled by a Dirichlet Process (DP) prior, which implies a clustering of related functions for different tasks. To make approximate Bayesian inference under this model we apply the Bayesian Hierarchical Clustering (BHC) algorithm. The experiments are conducted on two real world problems: (i) school exam score prediction and (ii) prediction of ground-motion parameters. In comparison to baseline methods with no shared prior the results show an improved prediction performance when using the hierarchical model.

Manuscript from author [PDF]

ES2012-33

EMFit based Ultrasonic Phased Arrays with evolved Weights for Biomimetic Target Localization

Jan Steckel, Andre Boen, Dieter Vanderest, Herbert Peremans

Abstract
Bats use the spatial filtering performed by their pinnae in localization tasks. We propose a similar localization scheme based on the spatial filtering of the received echoes by a phased array. By evolving the weights of a linear phased array using a genetic algorithm, a very efficient spatial filter can be implemented. The localization performance of the evolved array in combination with the biomimetic localization algorithm is compared to a standard phased array localization scheme.

Manuscript from author [PDF]

[Back to Top]


Unsupervised learning


ES2012-31

magnitude sensitive competitive learning

Enrique Pelayo, David Buldain, Carlos Orrite

Abstract
This paper presents a new algorithm, Magnitude Sensitive Competitive Learning (MSCL), which has the ability of distributing the unit weights following any magnitude calculated from the unit parameters or the input data inside the Voronoi region of the unit. This controlled behavior permits to surpass other standard Competitive Learning algorithms that only tend to concentrate neurons accordingly to the input data density. Some application examples applying different magnitude functions show the MSCL possibilities.

Manuscript from author [PDF]

ES2012-152

From neuronal cost-based metrics towards sparse coded signals classification

Anthony Mouraud, Quentin Barthélemy, Aurélien Mayoue, Cédric Gouy-Pailler, Anthony Larue, Hélène Paugam-Moisy

Abstract
Sparse signal decompositions are keys to efficient compression, storage and denoising, but they lack appropriate methods to exploit this sparsity for a classification purpose. Sparse coding methods based on dictionary learning may result in spikegrams, a sparse and temporal representation of signals by a raster of kernel occurrences through time. This paper proposes a method for coupling spike train cost-based metrics (from neuroscience) with spikegram sparse decompositions, for clustering multivariate signals. Experiments on character trajectories, recorded by sensors from natural handwriting, prove the validity of the approach, compared with currently available classification performance in literature.

Manuscript from author [PDF]

ES2012-163

Hybrid hierarchical clustering: cluster assessment via cluster validation indices

Mark Embrechts, Jonathan Linton, Christopher Gatti

Abstract
This paper introduces a novel method for speeding up hier- archical clustering with cluster seeding with the clusters obtained from a different clustering method (e.g., K-means). A benchmark study compares the cluster performance of hierarchical clustering and hierarchical cluster- ing with cluster seeding based on several cluster performance indices using a wide variety of real-world and artificial benchmark data sets. While cluster seeding can significantly speed up agglomerative hierarchical clus- tering, it will also affect the cluster quality, and thus the validation indices as well. Extensive benchmarks show that the impact of cluster seeding is often rather small.

Manuscript from author [PDF]

ES2012-48

Unsupervised learning of motion patterns

Thomas Guthier, Julian Eggert, Volker Willert

Abstract
Neurophysiological findings suggest that the visual cortex of mammals contains neural populations that are sensitive to specific motion patterns. In this paper, we present a new method to learn such patterns in an unsupervised way. To represent motion, dense optical flow fields of videos containing humans performing several actions like walking and running are estimated. We introduce VNMF, an extension of the translation invariant NMF that works on flow fields, along with a new energy term that enforces parts-basedness. VNMF incorporates three principles found in neural motion processing: Sparsity, non-negativity and direction selectivity. We find that the extracted motion patterns are shaped like body parts, which supports the idea that the representation of biological motion is directly linked to the shape of an object.

Manuscript from author [PDF]

ES2012-52

Robust clustering of high-dimensional data

Anastasios Bellas, Charles Bouveyron, Marie Cottrell, Jérôme Lacaille

Abstract
We address the problem of robust clustering of high - dimensional data, which is recurrent in real-world applications. Existing robust clustering methods are unfortunately sensitive in high dimension, while existing approaches for high-dimensional data are in general not robust. We propose a hybrid iterative EM-based algorithm that combines an efficient high-dimensional clustering algorithm and the trimming technique. We test our algorithm on synthetic and real-world data from the domain of aircraft engine health monitoring and show its efficiency for high-dimensional noisy datasets.

Manuscript from author [PDF]

ES2012-78

Image reconstruction using an iterative SOM based algorithm

jouini Manel, Thiria Sylvie, crépon Michel

Abstract
The frequent presence of clouds in optical remotely sensed imagery prevents space and time continuity and limits its exploitation. The aim of this study is to propose a new statistical processing approach for the reconstruction of areas covered by clouds in a time sequence of optical satellite images. The approach is an iterative SOM based algorithm and was applied to reconstruct ocean color images. It used the information contained in color images and a set of satellite-derived dynamic ocean products (sea surface temperature: SST, altimetry: SSH) to reproduce the local spatio temporal relationships of the cloudy images. The reconstruction method is general and can be applied to fill gaps in mutli-dimensional and correlated data.

Manuscript from author [PDF]

[Back to Top]


Statistical methods and kernel-based algorithms


ES2012-10

Deconvolution in nonparametric statistics

Kris De Brabanter, Bart De Moor

Abstract
In this tutorial paper we give an overview of deconvolution problems in nonparametric statistics. First, we consider the problem of density estimation given a contaminated sample. We illustrate that the classical Rosenblatt-Parzen kernel density estimator is unable to capture the full shape of the density while the presented method experiences almost no problems. Second, we use the previous estimator in a nonparametric regression framework with errors-in-variables.

Manuscript from author [PDF]

ES2012-170

Weighted/Structured Total Least Squares problems and polynomial system solving

Philippe Dreesen, Kim Batselier, Bart De Moor

Abstract
Weighted and Structured Total Least Squares (W/STLS) problems are generalizations of Total Least Squares with additional weighting and/or structure constraints. W/STLS are found at the heart of several mathematical engineering techniques, such as statistics and systems theory, and are typically solved by local optimization methods, having the drawback that one cannot guarantee global optimality of the retrieved solution. This paper employs the Riemannian SVD formulation to write the W/STLS problem as a system of polynomial equations. Using a novel matrix technique for solving systems of polynomial equations, the globally optimal solution of the W/STLS problem is retrieved.

Manuscript from author [PDF]

ES2012-149

Joint Regression and Linear Combination of Time Series for Optimal Prediction

Dries Geebelen, Kim Batselier, Philippe Dreesen, Signoretto Marco, Johan Suykens, Bart De Moor, Joos Vandewalle

Abstract
In most machine learning applications the time series to predict is fixed and one has to learn a prediction model that causes the smallest error. In this paper choosing the time series to predict is part of the optimization problem. This time series has to be a linear combination of a priori given time series. The optimization problem that we have to solve can be formulated as choosing the linear combination of a priori known matrices such that the smallest singular vector is minimized. This problem has many local minima and can be formulated as a polynomial system which we will solve using a polynomial system solver. The proposed prediction algorithm has applications in algorithmic trading in which a linear combination of stocks will be bought.

Manuscript from author [PDF]

ES2012-60

Averaging of kernel functions

Lluís Belanche, Alessandra Tosi

Abstract
In kernel-based machines, the integration of several kernels to build more flexible learning methods is a promising avenue for research. In particular, in Multiple Kernel Learning a compound kernel is build by learning a kernel that is the weighted mean of several sources. We show in this paper that the only feasible average for kernel learning is precisely the arithmetic average. We also show that three familiar means (the geometric, inverse root mean square and harmonic means) for positive real values actually generate valid kernels.

Manuscript from author [PDF]

ES2012-37

maximum likelihood estimation and polynomial system solving

Kim Batselier, Philippe Dreesen, Bart De Moor

Abstract
This article presents an alternative method to find the global maximum likelihood estimates of the mixing probabilities of a mixture of multinomial distributions. For these mixture models it is shown that the maximum likelihood estimates of the mixing probabilities correspond with the roots of a multivariate polynomial system. A new algorithm, set in a linear algebra framework, is presented which allows to find all these roots by solving a generalized eigenvalue problem.

Manuscript from author [PDF]

[Back to Top]


Classification and model selection


ES2012-43

L1-based compression of random forest models

Arnaud Joly, François Schnitzler, Pierre Geurts, Louis Wehenkel

Abstract
Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.

Manuscript from author [PDF]

ES2012-88

RNN Based Batch Mode Active Learning Framework

Gaurav Maheshwari, Vikram Pudi

Abstract
Active Learning has been applied in many real world classification tasks to reduce the amount of labeled data required for training a classifier. However most of the existing active learning strategies select only a single sample for labeling by the oracle in every iteration. This results in retraining the classifier after each sample is added which is quite computationally expensive. Also many of the existing sample selection strategies are not suitable for the multi-class classification tasks. To overcome these issues, we propose an efficient batch mode framework for active learning using the notion of influence sets based on Reverse Nearest Neighbor, which is applicable for multi-class classification as well. To demonstrate the effectiveness of our technique, we compare its performance against existing active learning techniques on real life datasets. Experimental results show that our technique outperforms existing active learning methods significantly especially on multi-class datasets.

Manuscript from author [PDF]

ES2012-85

Adaptive learning for complex-valued data

Kerstin Bunte, Frank-Michael Schleif, Michael Biehl

Abstract
In this paper we propose a variant of the Generalized Matrix Learning Vector Quantization (GMLVQ) for dissimilarity learning on complex-valued data. Complex features can be encountered in various data domains, e.g. stemming from Fourier transform ion cyclotron resonance mass spectrometry and image analysis. Current approaches deal with complex inputs by ignoring the imaginary parts or concatenating real and imaginary parts to a longer real valued vector. In this contribution we propose a prototype based classification method, which allows to deal with complex-valued data in its natural form. The algorithm is demonstrated on a benchmark data set and for leave recognition using Zernike moments. We observe that the complex version converges much faster compared to the original GMLVQ evaluated on the real parts only. The complex version has less free parameters than using a concatenated vector and is thus computationally more efficient than original GMLVQ.

Manuscript from author [PDF]

ES2012-111

Automatic Group-Outlier Detection

Amine Chaibi, Azzag Hanane, mustapha lebbah

Abstract
We propose in this paper a new measure called GOF (Group Outlier Factor) to detect groups outliers. To validate this measure we integrated it in a clustering process using Self organizing Map. The proposed approach is based on relative density of each group of data and simultaneously provides a partitioning of data and a quantitative indicator (GOF). The obtained results are very encouraging to continue in this direction.

Manuscript from author [PDF]

ES2012-176

A CUSUM approach for online change-point detection on curve sequences

Nicolas CHEIFETZ, Allou Samé, Patrice Aknin, Emmanuel DE VERDALLE

Abstract
Anomaly detection on sequential data is common in many domains such as fraud detection for credit cards, intrusion detection for cyber-security or military surveillance. This paper addresses a new CUSUM-like method for change point detection on curves sequences in a context of preventive maintenance of transit buses door systems. The proposed approach is derived from a specific generative modeling of curves. The system is considered out of control when the parameters of the curves density change. Experimental studies performed on real world data demonstrate the promising behavior of the proposed method.

Manuscript from author [PDF]

ES2012-13

One-class classifier based on extreme value statistics

David Martínez-Rego, Evan Kriminger, Jose C. Principe, Oscar Fontenla-Romero, Amparo Alonso-Betanzos

Abstract
Interest in One-Class Classification methods has soared in recent years due to its wide applicability in many practical problems where classification in the absence of counterexamples is needed. In this paper, a new one class classification rule based on order statistics is presented. It only relies on the embedding of the classification problem into a metric space, so it is suitable for Euclidean or other structured mappings. The suitability of the proposed method is assessed through a comparison both for artificial and real life data sets. The good results obtained pave the road to its application on practical novelty detection problems

Manuscript from author [PDF]

ES2012-139

Classifying Scotch Whisky from near-infrared Raman spectra with a Radial Basis Function Network with Relevance Learning

Andreas Backhaus, Praveen Cheriyan Ashok, Bavishna Balagopal Praveen, Kishan Dholakia, Udo Seiffert

Abstract
The instantaneous assessment of high-priced liquor products with minimal sample volume and no special preparation is an important task for quality monitoring and fraud detection. In this contribution the automated classification of Raman spectra acquired with a special optofluidic chip is performed with the use of a number of Artificial Neural Networks. A standard Radial Basis Function Network is adopted to incorporate relevance learning and showed robust classification performance across classification tasks. The acquired relevance weighting per feature dimension can be used to reduce the number of features while retaining a high level of accuracy.

Manuscript from author [PDF]

ES2012-81

Supervised and unsupervised classification approaches for human activity recognition using body-mounted sensors

Dorra Trabelsi, Samer Mohammed, Faicel Chamroukhi, Latifa Oukhellou, Yacine Amirat

Abstract
In this paper, the activity recognition problem from 3-d acceleration data measured with body-worn accelerometers is formulated as a problem of multidimensional time series segmentation and classi cation. More speci cally, the proposed approach uses a statistical model based on Multiple Hidden Markov Model Regression (MHMMR) to automatically analyze the human activity. The method takes into account the sequential appearance and temporal evolution of the data to easily detect activities and transitions. Classi cation results obtained by comparing the proposed approach to those of the standard supervised classi cation approaches as well as the standard hidden Markov model show that the proposed approach is promising.

Manuscript from author [PDF]

ES2012-86

Matrix relevance LVQ in steroid metabolomics based classification of adrenal tumors

Michael Biehl, Petra Schneider, David Smith, Han Stiekema, Angela Taylor, Beverly Hughes, Cedric Shackleton, Paul Stewart, Wiebke Arlt

Abstract
We present a machine learning system for the differential diagnosis of benign adrenocortical adenoma (ACA) vs. malignant adrenocortical carcinoma (ACC). The data employed for the classification are urinary excretion values of 32 steroid metabolites. We apply prototype-based classification techniques to discriminate the classes, in particular, we use modifications of Generalized Learning Vector Quantization including matrix relevance learning. The obtained system achieves high sensitivity and specificity and outperforms previously used approaches for the detection of adrenal malignancy. Moreover, the method identifies a subset of most discriminative markers which facilitates its future use as a non-invasive high-throughput diagnostic tool.

Manuscript from author [PDF]

ES2012-154

Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks

Caio Souza, Flavio Nobre, Priscila Lima, Robson Silva, Rodrigo Brindeiro, Felipe França

Abstract
This work presents an application of an improved version of the WiSARD weightless neural network in the recognition of different mutation types of HIV-1 and in the determination of antiretroviral drugs resistence. The data set used consists of 1205 gene sequence of the HIV-1 protease of subtypes B, C and F from patients under treatment failure. Experiments performed with the bleaching technique over the WiSARD model under different data representation strategies have shown promising results, both in terms of accuracy and standard deviation.

Manuscript from author [PDF]

ES2012-32

Adaptive Optimization for Cross Validation

Rudi Alessandro, Chiusano Gabriele, Alessandro Verri

Abstract
The process of model selection and assessment aims at finding a subset of parameters that minimize the expected test error for a model related to a learning algorithm. Given a subset of tuning parameters, an exhaustive grid search is typically performed. In this paper an automatic algorithm for model selection and assessment is proposed. It adaptively learns the error function in the parameters space, making use of the Scale Space theory and the Statistical Learning theory in order to estimate a reduced number of models and, at the same time, to make them more reliable. Extensive experiments are perfomed on the MNIST dataset.

Manuscript from author [PDF]

ES2012-62

The `K' in K-fold Cross Validation

Davide Anguita, Luca Ghelardoni, Alessandro Ghio, Luca Oneto, Sandro Ridella

Abstract
The K-fold Cross Validation (KCV) technique is one of the most used approaches by practitioners for model selection and error estimation of classifiers. The KCV consists in splitting a dataset into k subsets; then, iteratively, some of them are used to learn the model, while the others are exploited to assess its performance. However, in spite of the KCV success, only practical rule-of-thumb methods exist to choose the number and the cardinality of the subsets. We propose here an approach, which allows to tune the number of the subsets of the KCV in a data-dependent way, so to obtain a reliable, tight and rigorous estimation of the probability of misclassification of the chosen model.

Manuscript from author [PDF]

[Back to Top]


Recent developments in clustering algorithms


ES2012-5

Recent developments in clustering algorithms

Charles Bouveyron, Barbara Hammer, Thomas Villmann

Abstract

Manuscript from author [PDF]

ES2012-30

Curves clustering with approximation of the density of functional random variables

Julien Jacques, Cristian Preda

Abstract
Model-based clustering for functional data is considered. An alternative to model-based clustering using the functional principal components is proposed by approximating the density of functional random variables. The EM algorithm is used for parameter estimation and the maximum a posteriori rule provides the clusters. Real data application illustrate the interest of the proposed methodology.

Manuscript from author [PDF]

ES2012-22

Modified Conn-Index for the evaluation of fuzzy clusterings

Tina Geweniger, Marika Kästner, Mandy Lange, Thomas Villmann

Abstract
We propose an extension of the Conn-Index to evaluate fuzzy cluster solutions obtained from fuzzy prototype vector quantization, whereas the original Conn-Index was designed for crisp vector quantization models. The fuzzy index explicitly takes the fuzzy assignments resulting from fuzzy vector quantization into account. This avoids the information loss which would occur if the original crisp index is applied to fuzzy solutions.

Manuscript from author [PDF]

ES2012-107

modularity-based clustering for network-constrained trajectories

Mohamed Khalil El Mahrsi, Fabrice Rossi

Abstract
We present a novel clustering approach for moving object trajectories that are constrained by an underlying road network. The approach builds a similarity graph based on these trajectories then uses modularity-optimization hiearchical graph clustering to regroup trajectories with similar profiles. Our experimental study shows the superiority of the proposed approach over classic hierarchical clustering and gives a brief insight to visualization of the clustering results.

Manuscript from author [PDF]

ES2012-127

A Discussion on Parallelization Schemes for Stochastic Vector Quantization Algorithms

Matthieu Durut, Benoit Patra, Fabrice Rossi

Abstract
This paper studies parallelization schemes for stochastic Vector Quantization algorithms in order to obtain time speed-ups using distributed resources. We show that the most intuitive parallelization scheme does not lead to better performances than the sequential algorithm. Another distributed scheme is therefore introduced which obtains the expected speed-ups. Then, it is improved to fit implementation on distributed architectures where communications are slow and inter-machines synchronization too costly. The schemes are tested with simulated distributed architectures and, for the last one, with Microsoft Windows Azure platform obtaining speed-ups up to $32$ VMs.

Manuscript from author [PDF]

ES2012-132

Dissimilarity Clustering by Hierarchical Multi-Level Refinement

Brieuc Conan-Guez, Fabrice Rossi

Abstract
We introduce in this paper a new way of optimizing the natural extension of the quantization error using in k-means clustering to dissimilarity data. The proposed method is based on hierarchical clustering analysis combined with multi-level heuristic refinement. The method is computationally efficient and achieves better quantisation errors than the relational k-means.

Manuscript from author [PDF]

ES2012-188

Relevance learning for time series inspection

Andrej Gisbrecht, Dusan Sovilj, Barbara Hammer, Amaury Lendasse

Abstract
By means of local neighborhood regression and time windows, the generative topographic mapping (GTM) allows to predict and visually inspect time series data. GTM itself, however, is fully unsupervised. In this contribution, we propose an extension of relevance learning to time series regression with GTM. This way, the metric automatically adapts according to the relevant time lags resulting in a sparser representation, improved accuracy, and smoother visualization of the data.

Manuscript from author [PDF]

[Back to Top]


Feature selection and information-based learning


ES2012-12

How regular is neuronal activity?

Lubomir Kostal, Petr Lansky, Ondrej Pokora

Abstract
We propose and investigate two information-based measures of statistical dispersion of neuronal firing: the entropy-based dispersion and Fisher information-based dispersion. The measures are compared with the standard deviation. Although the standard deviation is used routinely, we show, that it is not well suited to quantify some aspects of dispersion that are often expected intuitively, such as the degree of randomness. The proposed dispersion measures are not entirely independent, although each describes the firing regularity from a different point of view.

Manuscript from author [PDF]

ES2012-120

On the Potential Inadequacy of Mutual Information for Feature Selection

Benoît Frénay, Gauthier Doquire, Michel Verleysen

Abstract
Despite its popularity as a relevance criterion for feature selection, the mutual information can sometimes be inadequate for this task. Indeed, it is commonly accepted that a set of features maximising the mutual information with the target vector leads to a lower probability of misclassification. However, this assumption is in general not true. Justifications and illustrations of this fact are given in this paper.

Manuscript from author [PDF]

ES2012-68

Cluster homogeneity as a semi-supervised principle for feature selection using mutual information

Frederico Coelho, Antonio Padua Braga, Michel Verleysen

Abstract
In this work the principle of homogeneity between labels and data clusters is exploited in order to develop a semi-supervised Feature Selection method. This principle will permit the use of cluster information to improve the estimation of feature relevance in order to increase selection performance. Mutual Information is used in a Forward-Backward search process, on this filter aproach, in order to evaluate the relevance of each feature to the data distribution and the existent labels, in a context of few labeled and many unlabeled instances.

Manuscript from author [PDF]

ES2012-133

enhanced emotion recognition by feature selection to animate a talking head

Hela Daassi-Gnaba, Yacine OUSSAR

Abstract
It is known that deaf and hard of hearing people can sub- stantially improve their skill to lip reading if they have access to speaker emotion. Moreover, it has been shown that animating an artificial talking head can provide this modality. In this paper, we assume that emotion recognition to animate such talking head can be performed using a small set of relevant features extracted from the speech signal. More precisely, we show that the implementation of linear classifiers using Support Vector Machines (SVM) with the involvement of a feature selection method leads to a promising performance which confirms our assumption.

Manuscript from author [PDF]

ES2012-97

Range-based non-orthogonal ICA using cross-entropy method

Easter Selvan Suviseshamuthu, Amit Chattopadhyay, Umberto Amato, Pierre-Antoine Absil

Abstract
A derivative-free framework for optimizing a non-smooth range-based contrast function in order to estimate independent components is presented. The proposed algorithm employs the von-Mises Fisher (vMF) distribution to draw random samples in the cross-entropy (CE) method, thereby intrinsically maintaining the unit-norm constraint that removes the scaling indeterminacy in independent component analysis (ICA) problem. Empirical studies involving natural images show how this approach outperforms popular schemes [1] in terms of the separation performance.

Manuscript from author [PDF]

[Back to Top]


Nonlinear dimensionality reduction and topological learning


ES2012-3

Type 1 and 2 symmetric divergences for stochastic neighbor embedding

John Lee

Abstract
Stochastic neighbor embedding (SNE) is a method of dimensionality reduction (DR) that involves softmax similarities measured between all pairs of data points. In order to build a low-dimensional embedding, SNE tries to reproduce the similarities observed in the high-dimensional data space. The capability of softmax similarities to fight the phenomenon of norm concentration has been studied in previous work. This paper investigates a complementary aspect, namely, the cost function that quantifies the mismatch between the high- and low-dimensional similarities. We show experimentally that switching from a simple Kullback-Leibler divergences to symmetric mixtures of divergences increases the quality of DR. This modification brings SNE to the performance level of its Student $t$-distributed variant, without the need to resort to non-identical similarity definitions in the high- and low-dimensional spaces. These results allow us to conclude that future improvements in similarity-based DR will likely emerge from better definitions of the cost function.

Manuscript from author [PDF]

ES2012-25

Out-of-sample kernel extensions for nonparametric dimensionality reduction

Andrej Gisbrecht, Wouter Lueks, Bassam Mokbel, Barbara Hammer

Abstract
Nonparametric dimensionality reduction (DR) techniques such as locally linear embedding or t-distributed stochastic neighbor embedding (t-SNE) constitute standard tools to visualize high dimensional and complex data in the Euclidean plane. With increasing data volumes and streaming applications, it is often no longer possible to project all data points at once. Rather, out-of-sample extensions (OOS) derived from a small subset of all data points are used. In this contribution, we propose a kernel mapping for OOS in contrast to direct techniques based on the DR method. This can be trained based on a given example set, or it can be trained indirectly based on the cost function of the DR technique. Considering t-SNE as an example and several benchmarks, we show that a kernel mapping outperforms direct OOS as provided by t-SNE.

Manuscript from author [PDF]

ES2012-157

A generative model that learns Betti numbers from a data set

Maxime Maillot, Michaël Aupetit, Gérard Govaert

Abstract
Analysis of multidimensional data is challenging. Topological invariants can be used to summarize essential features of such data sets. In this work, we propose to compute the Betti numbers from a generative model based on a simplicial complex learnt from the data. We compare it to the Witness Complex, a geometric technique based on nearest neighbors. Our results on different data distributions with known topology show that Betti numbers are well recovered with our method.

Manuscript from author [PDF]

[Back to Top]


Recurrent and neural networks, reinforcement learning, control


ES2012-138

Highly efficient localisation utilising weightless neural systems

Ben McElroy, Gillham Michael, Gareth Howells, Sarah Spurgeon, Kelly Michael, John Batchelor, Pepper Matthew

Abstract
Efficient localization is a highly desirable property for an autonomous navigation system. Weightless neural networks offer a real-time approach to robotics applications by reducing hardware and software requirements for pattern recognition techniques. Such networks offer the potential for objects, structures, routes and locations to be easily identified and maps constructed from fused limited sensor data as information becomes available. We show that in the absence of concise and complex information, localisation can be obtained using simple algorithms from data with inherent uncertainties using a combination of Genetic Algorithm techniques applied to a Weightless Neural Architecture.

Manuscript from author [PDF]

ES2012-172

The Exploration vs Exploitation Trade-Off in Bandit Problems: An Empirical Study

Bernard Manderick, Saba Yahyaa

Abstract
We compare well-known action selection policies used in reinforcement learning like epsilon-greedy and softmax with lesser known ones like the Gittins index and the knowledge gradient on bandit problems. The latter two are in comparison very performant. Moreover the knowledge gradient can be generalized to other problems.

Manuscript from author [PDF]

ES2012-15

intrinsic plasticity via natural gradient descent

Klaus Neumann, Jochen J. Steil

Abstract
This paper introduces the natural gradient for intrinsic plasticity, which tunes a neuron’s activation function such that its output distribution becomes exponentially distributed. The information-geometric properties of the intrinsic plasticity potential are analyzed and the improved learning dynamics when using the natural gradient are evaluated for a variety of input distributions. The applied measure for evaluation is the relative geodesic length of the respective path in parameter space.

Manuscript from author [PDF]

ES2012-24

Complex Valued Artificial Recurrent Neural Network as a Novel Approach to Model the Perceptual Binding Problem

Alexey Minin, Alois Knoll, Hans-Georg Zimmermann

Abstract
The brain is constantly faced with the task of grouping together features of objects that it perceives, in order to arrive at a coherent representation of these objects. Such features are, for example, shape, motion, color, depth, but also other aspects of perception. There is experimental evidence and a large body of theoretical work that supports the hypothesis that brains solve this so-called “binding” problem by synchronizing the temporal firing patterns in neuronal assemblies, with neurons that are sensitive to different features. According to this hypothesis, temporal correlations between neuronal impulses represent the fact that different perceived features have to be associated with one and the same object. In this paper we suggest a new model for solving the binding problem by introduc-ing complex-valued recurrent networks. These networks can represent sinusoidal oscillations and their phase, i.e., they can model the binding problem of neuronal assemblies by adjusting the relative phase of the oscillations of different feature detectors. As feature examples, we use color and shape – but the network would also function with any combination of other features. The suggested network architecture performs image generalization but can also be used as an image memory. The information about object color is represented in the phase of the network weights, while the spatial distribution of the neurons codes represent the object’s shape. We will show that the architecture can generalize ob-ject shapes and recognize object color with very low computational overhead.

Manuscript from author [PDF]

ES2012-90

A discrete/rhythmic pattern generating RNN

Tim Waegeman, Francis Wyffels, Benjamin Schrauwen

Abstract
Biological research supports the concept that advanced motion emerges from modular building blocks, which generate both rhythmical and discrete patterns. Inspired by these ideas, roboticists try to implement such building blocks using different techniques. In this paper, we show how to build such module by using a recurrent neural network (RNN) to encapsulate both discrete and rhythmical motion patterns into a single network. We evaluate the proposed system on a planar robotic manipulator. For training, we record several handwriting motions by back driving the robot manipulator. Finally, we demonstrate the ability to learn multiple motions (even discrete and rhythmic) and evaluate the pattern generation robustness in the presence of perturbations.

Manuscript from author [PDF]

ES2012-102

Fast calibration of hand movements-based interface for arm exoskeleton control

Hugo Martin, Sylvain Chevallier, Eric Monacelli

Abstract
Several muscular degenerative diseases alter motor abilities of large muscles but spare smaller muscles, e.g. keeping hand motor skills relatively unaffected while upper limbs ones are altered. Thus, hand movements could be be used to control an arm exoskeleton for rehabilitation and assistive purpose. Using an infra-red sensors (IR) based interface for the exoskeleton control, this paper describes the learning part of the system, endowing the system with a fast online calibration and adaptation abilities. This learning component shows good results and have been succesfully implemented on the real system.

Manuscript from author [PDF]

ES2012-179

Manifold-based non-parametric learning of action-value functions

Hunor Jakab, Lehel Csato

Abstract
Finding good approximations to state-action value functions is a central problem in model-free on-line reinforcement learning. The use of non-parametric function approximators enables us to simultaneously represent model and confidence. Q functions are often discontinuous and we present a novel Gaussian process (GP) kernel function to cope with this problem. We use a manifold-based distance measure in our kernels, the manifold being induced by the graph structure extracted from data. Using on-line learning, the graph formation is parallel with the main algorithm. This results in a compact and efficient graph structure, eliminates the need for predefined basis functions and improves the accuracy of estimated value functions, as tested on simulated robotic control tasks.

Manuscript from author [PDF]

ES2012-174

Recurrent Neural State Estimation in Domains with Long-Term Dependencies

Siegmund Duell, Lina Weichbrodt, Alexander Hans, Steffen Udluft

Abstract
This paper presents a state estimation approach for reinforcement learning (RL) of a partially observable Markov decision process. It is based on a special recurrent neural network architecture, the Markov decision process extraction network with shortcuts (MPEN-S). In contrast to previous work regarding this topic, we address the problem of long-term dependencies, which cause major problems in many real-world applications. The architecture is designed to model the reward-relevant dynamics of an environment and is capable to condense large sets of continuous observables to a compact Markovian state representation. The resulting estimate can be used as input for RL methods that assume the underlying system to be a Markov decision process. Although the approach was developed with RL in mind, it is also useful for general prediction tasks.

Manuscript from author [PDF]

ES2012-82

Using event-based metric for event-based neural network weight adjustment

Thierry Vieville, Rodrigo Salas, Bruno Cessac

Abstract
The problem of adjusting the parameters of an event-based network model is addressed here at the programmatic level. Considering temporal processing, the goal is to adjust the network units weights so that the outcoming events correspond to what is desired. The present work proposes a way to adapt, in the deterministic and discrete case, usual alignment metrics in order to derive suitable adjustment rules. At the numerical level, the stability and unbiasness of the method is verified.

Manuscript from author [PDF]

[Back to Top]


Parallel hardware architectures for acceleration of neural network computation


ES2012-8

Parallel neural hardware: the time is right

Ulrich Rückert, Erzsebet Merenyi

Abstract
It seems obvious that the massively parallel computations inherent in artificial neural networks (ANNs) can only be realized by massively parallel hardware. However, the vast majority of the many ANN applications simulate their ANNs on sequential computers which, in turn, are not resource-efficient. The increasing availability of parallel standard hardware such as FPGAs, graphics processors, and multi-core processors offers new scopes and challenges in respect to resource-efficiency and real-time applications of ANNs. Within this paper we will discuss some key issues for parallel ANN implementation on these standard devices compared to special purpose ANN implementations.

Manuscript from author [PDF]

ES2012-44

Towards biologically realistic multi-compartment neuron model emulation in analog VLSI

Sebastian Millner, Andreas Hartel, Johannes Schemmel, Karlheinz Meier

Abstract
We present a new concept for multi-compartment emulation on neuromorphic hardware based on the BrainScaleS wafer-scale system. The implementation features complex dendrite routing capabilities, realistic scaling of compartmental parameters and active spike propagation. Simulations proof the circuit’s capability of reproducing passive dendritic properties of a model from literature.

Manuscript from author [PDF]

ES2012-35

A GPU-accelerated algorithm for self-organizing maps in a distributed environment

Peter Wittek, Sándor Darányi

Abstract
In this paper we introduce a MapReduce-based implementation of self-organizing maps that performs compute-bound operations on distributed GPUs. The kernels are optimized to ensure coalesced memory access and effective use of shared memory. We have performed extensive tests of our algorithms on a cluster of eight nodes with two NVidia Tesla M2050 attached to each, and we achieve a 10x speedup for self-organizing maps over a distributed CPU algorithm.

Manuscript from author [PDF]

ES2012-54

Low-Power Manhattan Distance Calculation Circuit for Self-Organizing Neural Networks Implemented in the CMOS Technology

Rafal Dlugosz, Tomasz Talaska, Witold Pedrycz, Pierre-Andre Farine

Abstract
The paper presents an analog, current-mode circuit that calculates a distance between the neuron weights vectors W and the input learning patterns X. The circuit can be used as a component of different self-organizing neural networks (NN) implemented at the transistor level in the CMOS technology. In Self-Organizing Maps (SOM) as well as in NNs using the Neural Gas or the Winner Takes All (WTA) learning algorithms, to calculate a distance between the X and the W vectors, the same circuit can be used that makes the proposed circuit a universal solution. Earlier detailed simulations carried out by means of the software model of the WTA NN and the Kohonen SOM showed that using both the Euclidean (L2) and the Manhattan (L1) distance measures leads to similar learning results. For this reason, the L1 measure has been implemented, as in this case the circuit is much simpler than in the L2 case, resulting in very low chip area and low power dissipation. This enables including even large NNs in miniaturized portable devices, such as sensors in Wireless Sensor Networks (WSN) or Wireless Body Area Networks (WBAN).

Manuscript from author [PDF]

ES2012-71

Parallelization of Deep Networks

Michele De Filippo De Grazia, Ivilin Stoianov, Marco Zorzi

Abstract
Learning multiple levels of feature detectors in Deep Belief Networks is a promising approach both for neuro-cognitive modeling and for practical applications, but it comes at the cost of high computational requirements. Here we propose a method for the parallelization of unsupervised generative learning in deep networks based on distributing training data among multiple computational nodes in a cluster. We show that this approach almost linearly reduces the training time with very limited cost on performance.

Manuscript from author [PDF]

ES2012-110

Hardware accelerated real time classification of hyperspectral imaging data for coffee sorting

Andreas Backhaus, Jan Lachmair, Ulrich Rückert, Udo Seiffert

Abstract
Hyperspectral imaging has been proven to be a viable tool for automated food inspection that is non-invasive and on-line capable. In this contribution a hardware implemented Self-Organizing Feature Map with Conscience (C-SOM) is presented that is capable of on-line adaptation and recall in order to learn to classify green coffee varieties as well as coffee of different roast stages. The C-SOM showed favourable results in some datasets compared to a number of classical supervised neural network classifiers. The massive parallel neural hardware architecture allows for constant processing times at different map sizes.

Manuscript from author [PDF]

ES2012-137

Implementation Issues of Kohonen Self-Organizing Map Realized on FPGA

Rafal Dlugosz, Marta Kolasa, Michal Szulc, Witold Pedrycz, Pierre-Andre Farine

Abstract
Presented are the investigations showing an impact of the length of data signals in hardware implemented Kohonen Self-Organizing Maps (SOM) on the quality of the learning process. The aim of this work was to determine the allowable reduction of the number of bits in particular signals that does not deteriorate the network behavior. The efficiency of the learning process has been quantified by using the quantization error. The results obtained for the SOM realized on Field Programmable Gate Array (FPGA), as well as by means of the software model of the SOM show that the smallest allowable resolution (expressed in bits) of the weight signals equals seven, while the minimal bit length of the neighborhood signal ranges from 3 to 6 (depending on the map topology). For such values and properly selected values of other parameters the learning process remains undisturbed. Reducing the number of bits has an influence on the number of neurons that can be synthesized on a single FPGA device.

Manuscript from author [PDF]

ES2012-146

A hybrid CMOS/memristive nanoelectronic circuit for programming synaptic weights

Arne Heittmann, Tobias G. Noll

Abstract
In this paper a hybrid circuit is presented which comprises nanoelectronic resistive switches based on the electrochemical memory effect (ECM) as well as devices from a standard 40nm-CMOS process. A closed ECM device model, which is based on device physics, was used for simulations allowing for a precise prediction of the expected I-V characteristics. The device is used as a non-volatile and/or programmable synapse in a neuromorphic architecture. Expected performance figures are derived such as write time as well as robustness with regard to variations of supply voltage and timing errors. The results show that ECM cells are prospective devices for hybrid neuromorphic systems.

Manuscript from author [PDF]

ES2012-161

gNBXe -- a Reconfigurable Neuroprocessor for Various Types of Self-Organizing Maps

Jan Lachmair, Erzsebet Merenyi, Mario Porrmann, Ulrich Rückert

Abstract
In this paper we present the FPGA-based hardware accelerator gNBXe for emulation of classical Self-Organizing Maps (SOMs) and Conscience SOM (CSOM) in a multi-FPGA environment. After discussing how the CSOM is mapped to a resource-efficient digital hardware implementation, we present how the modular system architecture can be flexibly adapted to various application datasets. The hardware costs and scalability of a multi-FPGA based accelerator using Xilinx Virtex2 and Virtex4 FPGAs are discussed. Compared to a state-of-the-art multi-core PC, a speedup of 9.1 is achieved for a CSOM with 4,840 neurons and 196 synaptic weights.

Manuscript from author [PDF]

[Back to Top]