ESANN2011

19th European Symposium on Artificial Neural Networks
Bruges, Belgium, April 27-28-29

[Electronic proceedings home page] [Electronic proceedings author index]

ESANN2011
Content of the proceedings

WARNING: you need Adobe Acrobat reader 7.0 or more to view the PDF files below



Information theory related learning


ES2011-7

Information theory related learning

Thomas Villmann, Jose C. Principe, Andrej Cichocki

Abstract
This is the introduction paper to a special session held on ESANN conference 2011. It reviews and highlights recent developments and new direction in information related learning, which is a fastly developing research area. These algorithms are based on the fundamental principles of information theory and relate them implicitly or explicitly to learning algoithms and strategies.

Manuscript from author [PDF]

ES2011-100

Optimization of Parametrized Divergences in Fuzzy c-Means

Tina Geweniger, Marika Kaestner, Thomas Villmann

Abstract
We propose the utilization of divergences as dissimilarity measure in the Fuzzy c-Means algorithm for the clustering of functional data. Further we adapt the relevance parameter to improve the data representation and therefore obtain more accurate clusterings in terms of separation and compactness. We show for two example applications that this method leads to improved performance.

Manuscript from author [PDF]

ES2011-66

Multivariate class labeling in Robust Soft LVQ

Petra Schneider, Tina Geweniger, Frank-Michael Schleif, Michael Biehl, Thomas Villmann

Abstract
We introduce a generalization of Robust Soft Learning Vector Quantization (RSLVQ). This algorithm for nearest prototype classification is derived from an explicit cost function and follows the dynamics of a stochastic gradient ascent. We generalize the RSLVQ cost function with respect to vectorial class labels: Probabilistic LVQ (PLVQ) allows to realize multivariate class memberships for prototypes and training samples, and the prototype labels can be learned from the data during training. We present experiments to demonstrate the new algorithm in practice.

Manuscript from author [PDF]

ES2011-11

Statistical dependence measure for feature selection in microarray datasets

Veronica Bolon-Canedo, Sohan Seth, Noelia Sánchez-Maroño, Amparo Alonso-Betanzos, Jose C. Principe

Abstract
Feature selection is the domain of machine learning which studies data-driven methods to select, among a set of input variables, the ones that will lead to the most accurate predictive model. In this paper, a statistical dependence measure is presented for variable selection in the context of classification. Its performance is tested over DNA microarray data, a challenging dataset for machine learning researchers due to the high number of genes and relatively small number of measurements. This measure is compared against the so called mRMR approach, and is shown to obtain better or equal performance over the binary datasets.

Manuscript from author [PDF]

ES2011-72

Mathematical Foundations of the Self Organized Neighbor Embedding (SONE) for Dimension Reduction and Visualization

Kerstin Bunte, Frank-Michael Schleif, Sven Haase, Thomas Villmann

Abstract
In this paper we propose the generalization of the recently introduced Neighbor Embedding Exploratory Observation Machine (NE-XOM) for dimension reduction and visualization. We provide a general mathematical framework called Self Organized Neighbor Embedding (SONE). It treats the components, like data similarity measures and neighborhood functions, independently and easily changeable. And it enables the utilization of different divergences, based on the theory of Fr\'echet derivatives. In this way we propose a new dimension reduction and visualization algorithm, which can be easily adapted to the user specific request and the actual problem.

Manuscript from author [PDF]

[Back to Top]


Self-organizing maps and recurrent networks


ES2011-71

Sparsity Issues in Self-Organizing-Maps for Structures

Markus Hagenbuchner, Giovanni Da San Martino, Ah Chung Tsoi, Alessandro Sperduti

Abstract
Recent developments with Self-Organizing Maps (SOMs) produced methods capable of clustering graph structured data onto a fixed dimensional display space. These methods have been applied successfully to a number of benchmark problems and produced state-of-the-art results. This paper discusses a limitation of the most powerful version of these SOMs, known as probability measure graph SOMs (PMGraphSOMs), viz., the sparsity induced by processing a large number of small graphs, which prevents a successful application of PMGraphSOM to such problems. An approach using the idea of compactifying the generated state space to address this sparsity problem is proposed. An application to an established benchmark problem, viz., the Mutag dataset in toxicology will show that the proposed method is effective when dealing with a large number of small graphs. Hence, this work fills a gap between the processing of a number of small graphs, and the processing of densely connected graphs using PMGraphSOMs.

Manuscript from author [PDF]

ES2011-26

Multi-Goal Path Planning Using Self-Organizing Map with Navigation Functions

Jan Faigl, Jan Mačák

Abstract
In this paper, we present a combination of Self-Organizing Map (SOM) approach and navigation functions in the Traveling Salesman Problem (TSP) with segment goals. The problem arises from the inspection planning, where a path from which all points of the given polygonal environment have to be ``seen''. This type of problem belongs to the TSP with neighborhoods family. Moreover, the problem has to also deal with paths among obstacles, thus the problem is not the Euclidean TSP. The proposed approach demonstrates applicability of SOM principles in such problems in which SOM has not yet been applied.

Manuscript from author [PDF]

ES2011-40

Statistical properties of the `Hopfield estimator' of dynamical systems

Miguel Atencia, Gonzalo Joya

Abstract
This paper analyses the statistical properties of a method for estimating the parameters of systems defined by ordinary differential equations. In previous work, this estimation method was defined as an adapted version of Hopfield neural networks, and its convergence and robustness with respect to signal disturbances were proved, even when parameters are time-varying. This contribution aims at establishing the behaviour of the estimation error by performing a set of simulations where a random noise with known probability distribution is added to signals. It is shown that, asymptotically, the estimator is unbiased and its variance vanishes. Further theoretical work is being undertaken in order to rigourously support these empirical findings.

Manuscript from author [PDF]

ES2011-31

Negatively Correlated Echo State Networks

Ali Rodan, Peter Tino

Abstract
Echo State Network (ESN) is a special type of recurrent neural network with fixed random recurrent part (reservoir) and a trainable reservoir-to-output readout mapping (typically obtained by linear regression). In this work we utilise an ensemble of ESNs with diverse reservoirs whose collective read-out is obtained through Negative Correlation Learning (NCL) of ensemble of Multi-Layer Perceptrons (MLP), where each individual MPL realises the readout from a single ESN. Experimental results on three data sets confirm that, compared with both single ESN and flat ensembles of ESNs, NCL based ESN ensembles achieve better generalisation performance.

Manuscript from author [PDF]

ES2011-48

Reservoir regularization stabilizes learning of Echo State Networks with output feedback

R. Felix Reinhart, Jochen J. Steil

Abstract
Output feedback is crucial for autonomous and parameterized pattern generation with reservoir networks. Read-out learning can lead to error amplification in these settings and therefore regularization is important for both generalization and reduction of error amplification. We show that regularization of the inner reservoir network mitigates parameter dependencies and boosts the task-specific performance.

Manuscript from author [PDF]

[Back to Top]


Semi-supervised learning


ES2011-57

A Multi-kernel Framework for Inductive Semi-supervised Learning

Xilan TIAN, Gilles Gasso, Stéphane Canu

Abstract
We investigate the benefit of combining both cluster assumption and manifold assumption underlying most of the semi-supervised algorithms using the flexibility and the efficiency of multi-kernel learning. The multiple kernel version of Transductive SVM (a cluster assumption based approach) is proposed and it is solved based on DC (Difference of Convex functions) programming. Experiments on the benchmark data sets indicate the promising results of the proposed work.

Manuscript from author [PDF]

ES2011-80

Training of multiple classifier systems utilizing partially labeled sequential data sets

Martin Schels, Patrick Schillinger, Friedhelm Schwenker

Abstract
Making use of unlabeled data samples for training a classifier is a desirable aim for many real world applications in pattern recognition. In this study, a multiple classifier system is utilized to investigate this matter. Further, cluster analysis is used in order to group the available data while neglecting the actual labels. Then, by implementing an information fusion architecture based on these clusters, a classification architecture is constructed. This kind of an architecture is investigated by means of a facial expression data collection with focusing on one-against-one class decisions to produce locally “unlabeled”, i.e. not assigned to one of the considered classes, data.

Manuscript from author [PDF]

[Back to Top]


Computational Intelligence in Life Sciences


ES2011-5

Recent trends in computational intelligence in life sciences

Udo Seiffert, Frank-Michael Schleif, Dietlind Zühlke

Abstract

Manuscript from author [PDF]

ES2011-36

Adaptive Kernel Smoothing Regression for Spatio-Temporal Environmental Datasets

Federico Montesino Pouzols, Amaury Lendasse

Abstract
This paper describes a method for performing kernel smoothing regression in an incremental, adaptive manner. A simple and fast combination of incremental vector quantization with kernel smoothing regression using adaptive bandwidth is shown to be effective for online regression modeling of environmental datasets. The method is illustrated on openly available datasets corresponding to the Tropical Atmosphere Ocean array and the Helsinki Commission hydrographical database for the Baltic Sea.

Manuscript from author [PDF]

ES2011-18

Generalized functional relevance learning vector quantization

Marika Kaestner, Barbara Hammer, Michael Biehl, Thomas Villmann

Abstract
Generalized learning vector quantization (GRLVQ) is a prototype based classification algorithm with metric adaptation weighting each data dimensions according to their relevance for the classification task. We present in this paper an extension for functional data, which are usually very high dimensional. This approach supposes the data vectors have to be functional representations. Taking into account, these information the so-called relevance profile are modeled by superposition of simple basis functions depending on only a few parameters. As a consequence, the resulting functional GRLVQ has drastically reduced number of parameters to be adapted for relevance learning. We demonstrate the ability of the new algorithms for standard functional data sets using different basis functions, namely Gaussians and Lorentzians.

Manuscript from author [PDF]

ES2011-89

Patch Affinity Propagation

Xibin Zhu, Barbara Hammer

Abstract
Affinity propagation constitutes an exemplar based clustering technique which reliably optimizes the quantization error given a matrix of pairwise data dissimilarities by means of the max-sum algorithm for factor graphs. Albeit very efficient for sparse matrices, it displays squared complexity in the worst case, hence it is not suited as high throughput method due to time and memory constraints. We propose an extension of affinity propagation to patch clustering such that data are treated in chunks of fixed size with limited memory requirements and linear time. We test the suitability of the approach for two biomedical applications.

Manuscript from author [PDF]

ES2011-20

Multispectral image characterization by partial generalized covariance

Marc Strickert, Björn Labitzke, Andreas Kolb, Thomas Villmann

Abstract
A general method is presented for the assessment of data attribute variability, which plays an important role in initial screening of multi- and high-dimensional data sets. Instead of the commonly used second centralized moment, known as variance, the proposed method allows a mathematically rigorous characterization of attribute sensitivity given not only Euclidean distances but partial data comparisons by general similarity measures. Depending on the choice of measure different spectral features get highlighted by attribute assessment, this way creating new image segmentation aspects, as shown in a comparison of Euclidean distance, Pearson correlation and gamma divergence applied to multi-spectral images.

Manuscript from author [PDF]

[Back to Top]


Learning I


ES2011-14

Increased robustness and intermittent dynamics in structured Reservoir Networks with feedback

Sarah Jarvis, Stefan Rotter, Ulrich Egert

Abstract
Recent studies using feedforward Echo State Networks (ESN) demonstrate that reservoir stability can be strongly affected by reservoir substructures, such as clusters. Here, we evaluate the impact of including feedback on clustered ESNs and assert that certain cluster configurations extend the permissible range of spectral radius values. We also report a new class of reservoir activity: intermittent dynamics, characterized by variable periods of chaotic activity before returning to quiescent behaviour. Using a non-linear benchmark data set, we establish clustered ESNs have comparable performance against conventional ESNs, but importantly display an increased tolerance and robustness to spectral radius and input choice.

Manuscript from author [PDF]

ES2011-69

Anticipating Rewards in Continuous Time and Space with Echo State Networks and Actor-Critic Design

Mohamed Oubbati

Abstract
In this paper we implement echo state networks (ESNs) within the concept of actor-critic design to obtain optimal control policy for a mobile robot. The robot is asked to anticipate future rewards/punishments and react accordingly. In this design, the ESN critic has to consider continuous state/action space, deal with uncertainties, and be computationally cheap in order to deal with real-world constraints. Benefits and problems involved in a obstacle avoidance scenario are discussed, supported by real-world experiments.

Manuscript from author [PDF]

ES2011-60

Application of stochastic recurrent reinforcement learning to index trading

Denise Gorse

Abstract
A novel stochastic adaptation of the recurrent reinforcement learning (RRL) methodology is applied to daily, weekly, and monthly stock index data, and compared to results obtained elsewhere using genetic programming (GP). The data sets used have been a considered a challenging test for algorithmic trading. It is demonstrated that RRL can reliably outperform buy-and-hold for the higher frequency data, in contrast to GP which performed best for monthly data.

Manuscript from author [PDF]

ES2011-110

A brief tutorial on reinforcement learning: The game of Chung Toi

Christopher Gatti, Jonathan Linton, Mark Embrechts

Abstract
This work presents a simple implementation of reinforcement learning, using the temporal difference algorithm and a neural network, applied to the board game of Chung Toi, which is a challenging variation of Tic-Tac-Toe. The implementation of this learning algorithm is fully described and includes all parameter settings and various techniques to improve the ability of the network to learn the board game. With relatively little training, the network was able to win nearly 90% of games played against a 'smart' random opponent.

Manuscript from author [PDF]

ES2011-62

D-VisionDraughts: a draughts player neural network that learns by reinforcement in a high performance environment

Araujo Barcelos Ayres Roberto, Silva Julia Rita Maria, Matias Junior Rivalino

Abstract
This paper describes D-VisionDraughts, a distributed player agent for draughts which is based on Neural Networks trained by Temporal Differences. D-VisionDraughs is trained in a high performance environment and achieves a high level of play without expert game analysis and with minimum human intervention. D-VisionDraughts corresponds to a distributed version of the efficient agent player VisionDraughts. In this way, the main contributions of this paper consist on substituting the distributed Young Brothers Wait Concept algorithm (YBWC) for the serial alpha-beta search algorithm used in VisionDraughts and on measuring the impact of a high performance environment into the non supervised learning abilities of the player. Evaluative tests proved that even a modest distributed version counting on just 10 processors is able to reduce from 68% the search runtime and to increase from 15% its capacity of winning

Manuscript from author [PDF]

ES2011-58

SO-VAT: Self-Organizing Visual Assessment of cluster Tendency for large data sets

Enrique Pelayo, Carlos Orrite, David Buldain

Abstract
A new method, Self-Organizing Visual Assessment of cluster Tendency (SO-VAT), is given for visually assessing the cluster tendency in large data sets. It is based on training a SOM with the input samples, and then calculating the VAT image from a selected group of the generated neurons, selection that is done according to a certain density of activation. Tests with synthetic and real examples demonstrate that the new SO-VAT algorithm results in clearer images and shorter computing time than applying directly the VAT procedure to the whole input-data set.

Manuscript from author [PDF]

ES2011-104

New conditioning model for robots

Jean Marc Salotti

Abstract
We present a neural network for the prediction of rewards in a conditioning model. It is based on two noisy-or and one noisy-and nodes and update rules inspired from BANNER technique. Interesting results are presented with application to robotics.

Manuscript from author [PDF]

ES2011-94

Stability of Neural Network Control for Uncertain Sampled-Data Systems

Pornchai Khlaeo-om, Sasikanchana Yenaeng, Sunya Pasuk, Supachai Aroonpun, Sompun Aumpawan

Abstract
This paper derives robust stability conditions for neural network control of sampled-data systems whose parameters are uncertain. The controllers are nonlinear, full state regulators implemented as single hidden layer, feedforward neural networks. The controlled systems must be locally controllable and full-state accessible. The robust stability is confirmed by the existence of a Lyapunov function of the closed loop systems. A modified backpropagation algorithm with a model reference technique is employed to determine the weights of the controllers. Simulation results on the classical motor-driven inverted pendulum model are presented to demonstrate the applications of these conditions.

Manuscript from author [PDF]

ES2011-55

Thresholds tuning of a neuro-symbolic net controlling a behavior-based robotic system

Mariacarla Staffa, Silvia Rossi, Massimo De Gregorio, Ernesto Burattini

Abstract
In this paper we present the results obtained by adopting an evolutionary approach to tune some critical neuron thresholds of a neuro-symbolic net that regulates the overall emergent behavior of a behavior-based robotic system.

Manuscript from author [PDF]

ES2011-101

Ensemble Usage for More Reliable Policy Identification in Reinforcement Learning

Alexander Hans, Steffen Udluft

Abstract
Reinforcement learning (RL) methods employing powerful function approximators like neural networks have become an interesting approach for optimal control. Since they learn a policy from observations, they are also applicable when no analytical description of the system is available. Although impressive results have been reported, their handling in practice is still hard, as they can fail at reliably determining a good policy. In previous work, we used ensembles of policies from independent runs of neural fitted Q-iteration (NFQ) to produce successful policies more reliable. In this paper we evaluate the approach on more problems and propose to form ensembles from successive iterations of a single NFQ run as a computationally cheap alternative to completely independent runs.

Manuscript from author [PDF]

ES2011-16

Fisherman learning algorithm of the SOM realized in the CMOS technology

Rafal Dlugosz, Marta Kolasa, Witold Pedrycz

Abstract
This study presents an idea of transistor level realization of the fisherman learning algorithm of Self-Organizing Maps (SOMs). The realization of this algorithm in hardware calls for a soluion of several specific problems not present in software implementation. The main problem is related to an iterative nature of the adaptation process of the neighboring neurons positioned at particular rings surrounding the winning neuron. This makes the circuit structure of the SOM very complex. To come up with a feasible realization, we introduce some modifications to the original fisherman algorithm. Detailed simulations of the software model of the SOM show that these modifications do not have the negative impact on the learning process, and helps bring significant reduction of the circuit complexity. In consequence, a fully parallel adaptation of all neurons is possible, which makes the SOM very fast.

Manuscript from author [PDF]

ES2011-12

Abstract category learning

Atsushi Hashimoto, Haruo Hosoya

Abstract
Motivated by a neurophysiological experiment on prefrontal cortex, we study a scheme for learning abstract categories. An abstract category represents a set of vectors that are identical to each other modulo substitution, e.g., 'ABAB', 'BABA', 'ACAC', etc. We employ a clustering-based unsupervised learning method for such abstract categories, in which the recognition step is reduced to the problem of maximal perfect weight matching. Our simulations using artificial inputs confirm that the scheme learns abstract categories robustly even with a certain level of noise in the inputs.

Manuscript from author [PDF]

ES2011-97

Symbolic computing of LS-SVM based models

Siamak Mehrkanoon, Li Jiang, Carlos Alzate, J.A.K. Suykens

Abstract
This paper introduces a software tool emph{SYM-LS-SVM-SOLVER} written in emph{Maple} to derive the dual system and the dual model representation of LS-SVM based models, symbolically. emph{SYM-LS-SVM-SOLVER} constructs the Lagrangian from the given objective function and list of constraints. Afterwards it obtains the KKT (Karush-Kuhn-Tucker) optimality conditions and finally formulates a linear system in terms of the dual variables. The effectiveness of the developed solver is illustrated by applying it to a variety of problems involving LS-SVM based models.

Manuscript from author [PDF]

ES2011-43

Sparse LS-SVMs with L0–norm minimization

J. López, K. De Brabanter, J.R. Dorronsoro, J.A.K. Suykens

Abstract
Least-Squares Support Vector Machines (LS-SVMs) have been successfully applied in many classification and regression tasks. Their main drawback is the lack of sparseness of the final models. Thus, a procedure to sparsify LS-SVMs is a frequent desideratum. In this paper, we adapt to the LS-SVM case a recent work for sparsifying classical SVM classifiers, which is based on an iterative approximation to the L0-norm. Experiments on real-world classification and regression datasets illustrate that this adaptation achieves very sparse models, without significant loss of accuracy compared to standard LS-SVMs or SVMs.

Manuscript from author [PDF]

ES2011-29

A post-processing strategy for SVM learning from unbalanced data

Haydemar Núñez, Luis Gonzalez-Abril, Cecilio Angulo

Abstract
Standard learning algorithms may perform poorly when learning from unbalanced datasets. Based on the Fisher's discriminant analysis, a post-processing strategy is introduced to deal datasets with significant imbalance in the data distribution. Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.

Manuscript from author [PDF]

ES2011-87

Clustering data streams with weightless neural networks

Douglas de O. Cardoso, Priscila M. V. Lima Lima, Massimo De Gregorio, João Gama, Felipe M. G. França

Abstract
Producing good quality clustering of data streams in real time is a difficult problem, since it is necessary to perform the analysis of data points arriving in a continuous style, with the support of quite limited computational resources. The incremental and evolving nature of the resulting clustering structures must reflect the dynamics of the target data stream. The WiSARD weightless perceptron, and its associated DRASiW extension, are intrinsically capable of, respectively, performing one-shot learning and producing prototypes of the learnt categories. This work introduces a simple generalization of RAM-based neurons in order to explore both weightless neural models in the data stream clustering problem.

Manuscript from author [PDF]

ES2011-49

Locating Anomalies Using Bayesian Factorizations and Masks

Li Yao, Amaury Lendasse, Francesco Corona

Abstract
A plethora of methods have been developed to handle anomaly detection in various application domains. This work focuses on locating anomalies inside a categorical data set without assuming any specific domain knowledge. By exploiting the conditional dependence and independence relationships among data attributes, not only can data analysts recognize the anomaly, but also locate the potentially anomalous attributes inside an anomalous instance following its masks. Masks are geometrically generated based on the factorization of the joint probability from a Bayesian network automatically learnt from the given data set.

Manuscript from author [PDF]

ES2011-42

Comparison of the Complex Valued and Real Valued Neural Networks Trained with Gradient Descent and Random Search Algorithms

Hans-Georg Zimmermann, Alexey Minin, Viktoria Kusherbaeva

Abstract
Complex Valued Neural Network is one of the open topics in the machine learning society. In this paper we will try to get through the problems of the complex valued neural networks. The outcome of the current research is the combined global-local algorithm for training the complex valued feed forward neural network which is appropriate for the considered chaotic problem.

Manuscript from author [PDF]

[Back to Top]


Seeing is believing: The importance of visualization in real-world machine learning applications


ES2011-3

Seeing is believing: The importance of visualization in real-world machine learning applications

Alfredo Vellido, José D Martín, Fabrice Rossi, Paulo Lisboa

Abstract
The increasing availability of data sets with a huge amount of information, coded in many different features, justifies the research on new methods of knowledge extraction: the great challenge is the translation of the raw data into useful information that can be used to improve decision-making processes, detect relevant profiles, find out relationships among features, etc. It is undoubtedly true that a picture is worth a thousand words, what makes visualization methods be likely the most appealing and one of the most relevant kinds of knowledge extration methods. At ESANN 2011, the special session ``Seeing is believing: The importance of visualization in real-world machine learning applications'' reflects some of the main emerging topics in the field. This tutorial prefaces the session, summarizing some of its contributions, while also providing some clues to the current state and the near future of visualization methods within the framework of Machine Learning.

Manuscript from author [PDF]

ES2011-24

Hierarchical clustering for graph visualization

Stéphan Clémençon, Hector De Arazoza, Fabrice Rossi, Viet Chi Tran

Abstract
This paper describes a graph visualization methodology based on hierarchical maximal modularity clustering, with interactive and significant coarsening and refining possibilities. An application of this method to HIV epidemic analysis in Cuba is outlined.

Manuscript from author [PDF]

ES2011-44

A probabilistic approach to the visual exploration of G Protein-Coupled Receptor sequences

Alfredo Vellido, Martha Ivón Cárdenas, Iván Olier, Xavier Rovira, Jesús Giraldo

Abstract
The study of G protein-coupled receptors (GPCRs) is of great interest in pharmaceutical research, but only a few of their 3D structures are known at present. On the contrary, their amino acid sequences are known and accessible. Sequence analysis can provide new insight on GPCR function. Here, we use a kernel-based statistical machine learning model for the visual exploration of GPCR functional groups from their sequences. This is based on the rich information provided by the model regarding the probability of each sequence belonging to a certain receptor group.

Manuscript from author [PDF]

ES2011-74

Growing Hierarchical Sectors on Sectors

José M Martínez, Pablo Escandell, Emilio Soria, José D Martín, Juan Gómez, Joan Vila

Abstract
Self-organizing maps are widely used in visual data mining. This paper proposes a new visualization approach for GHSOM algorithm, a hierarchical variant of SOM. The method is based on pie charts. That improves the visualization in hierarchical data structures making possible to extract all the existing relationships among the attributes of the neurons at any hierarchy level. The methodology is tested in one synthetic data set and one real data set. Achieved results show the suitability and usefulness of the proposed approach.

Manuscript from author [PDF]

ES2011-56

Analysis of a Reinforcement Learning algorithm using Self-Organizing Maps

Vicente Buendia-Ramon, Emilio Soria, José D Martín, Pablo Escandell, José M Martínez

Abstract
The scenario of this work is defined by the need of many Machine Learning algorithms to tune a number of parameters that define its behavior; the resulting performance can be evaluated with different indices. The relationship between parameters and performance may be neither linear nor straightforward. This work proposes a qualitative approach to the afore-mentioned relationship by using Self-Organizing Maps due to their visual information processing. The approach is evaluated in the framework of Reinforcement Learning algorithms.

Manuscript from author [PDF]

[Back to Top]


Learning theory


ES2011-15

General bound of overfitting for MLP regression models

joseph Rynkiewicz

Abstract
Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false. In this paper, we present an universal bound for the overfitting of such model under smooth assumptions, this bound is valid without Gaussian or identifiability assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite.

Manuscript from author [PDF]

ES2011-78

Maximal Discrepancy vs. Rademacher Complexity for error estimation

Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella

Abstract
The Maximal Discrepancy and the Rademacher Complexity are powerful statistical tools that can be exploited to obtain reliable, albeit not tight, upper bounds of the generalization error of a classifier. We study the different behavior of the two methods when applied to linear classifiers and suggest a practical procedure to tighten the bounds. The resulting generalization estimation can be succesfully used for classifier model selection.

Manuscript from author [PDF]

[Back to Top]


Feature selection and dimensionality reduction


ES2011-8

Class-Specific Feature Selection for One-Against-All Multiclass SVMs

Gaël de Lannoy, Damien Francois, Michel Verleysen

Abstract
This paper proposes a method to perform class-specific feature selection in multiclass addressed solved with the one-against-all strategy. The main issue arises at the final step of the classification process, where binary classifier outputs must be compared one against another to elect the winning class. This comparison may be biased towards one specific class when the binary classifiers are built on distinct feature subsets. This paper proposes a normalization of the binary classifiers outputs that allows fair comparisons in such cases.

Manuscript from author [PDF]

ES2011-50

Mutual information for feature selection with missing data

Gauthier Doquire, Michel Verleysen

Abstract
Feature selection is an important task for many machine learning applications; moreover missing data are encoutered very often in practice. This paper proposes to adapt a nearest neighbors based mutual information estimator to handle missing data and to use it to achieve feature selection. Results on arti cial and real world datasets show that the method is able to select important features without the need for any imputation algorithm. Moreover, experiments also indicate that selecting the features before imputing the data generally increases the precision of the prediction models.

Manuscript from author [PDF]

ES2011-19

Probabilistic Fisher discriminant analysis

Charles Bouveyron, Camille Brunet

Abstract
Fisher Discriminant Analysis (FDA) is a popular method for dimensionality reduction and classification but has poor performances in the cases of label noise and sparse labeled data. To overcome these limitations, we propose a probabilistic framework for FDA and extend it to the semi-supervised case. Experiments on real-world datasets show that the proposed approach works as well as FDA in standard situations and outperform it in the label noise and sparse label cases.

Manuscript from author [PDF]

ES2011-73

Supervised dimension reduction mappings

Kerstin Bunte, Michael Biehl, Barbara Hammer

Abstract
We propose a general principle to extend dimension reduction tools to explicit dimension reduction mappings and we show that this can serve as an interface to incorporate prior knowledge in the form of class labels. We explicitly demonstrate this technique by combining locally linear mappings which result from matrix learning vector quantization schemes with the t-distributed stochastic neighbor embedding cost function. The technique is tested on several benchmark data sets.

Manuscript from author [PDF]

[Back to Top]


Learning of causal relations


ES2011-2

Learning of causal relations

John A. Quinn, Joris Mooij, Tom Heskes, Michael Biehl

Abstract
To learn about causal relations between variables just by observing samples from them, particular assumptions must be made about those variables' distributions. This article gives a practical description of how such a learning task can be undertaken based on different possible assumptions. Two categories of assumptions lead to different methods, constraint-based and Bayesian learning, and in each case we review both the basic ideas and some recent extensions and alternatives to them.

Manuscript from author [PDF]

ES2011-93

Inferring the causal decomposition under the presence of deterministic relations

Jan Lemeire, Stijn Meganck, Francesco Cartella, Tingting Liu, Alexander Statnikov

Abstract
The presence of deterministic relations pose problems for current algorithms that learn the causal structure of a system based on the observed conditional independencies. Deterministic variables lead to information equivalences; two sets of variables have the same information about a third variable. Based on information content, one cannot decide on the direct causes. Several edges model equally well the dependencies. We call them equivalent edges. We propose to select among the equivalent edges the one with the simplest descriptive complexity. This approach assumes that the descriptive complexity increases along a causal path. As confirmed by our experimental results, the accuracy of the method depends on the chance of accidental matches of complexities.

Manuscript from author [PDF]

ES2011-106

A unified approach to estimation and control of the False Discovery Rate in Bayesian network skeleton identification

Angelos P. Armen, Ioannis Tsamardinos

Abstract
Constraint-based Bayesian network (BN) structure learning algorithms typically control the False Positive Rate (FPR) of their skeleton identification phase. The False Discovery Rate (FDR), however, may be of greater interest and methods for its utilization by these algorithms have been recently devised. We present a unified approach to BN skeleton identification FDR estimation and control and experimentally evaluate the performance of FDR estimators in both tasks over several networks. We demonstrate that estimation is too conservative for most networks and strong control at common FDR thresholds is not achieved with some networks; finally, we identify the possible causes of this situation.

Manuscript from author [PDF]

ES2011-82

A structure independent algorithm for causal discovery

Tom Claassen, Tom Heskes

Abstract
We present two inference rules, based on so called minimal conditional independencies, that are sufficient to find all invariant arrowheads in a single causal DAG, even when selection bias may be present. It turns out that the set of seven graphical orientation rules that are usually employed to identify these arrowheads are, in fact, just different instances/manifestations of these two rules. The resulting algorithm to obtain all definite causal information is elegant and fast, once the (often surprisingly small) set of minimal independencies is found.

Manuscript from author [PDF]

ES2011-84

Causal relevance learning for robust classification under interventions

Ernest Mwebaze, John A. Quinn, Michael Biehl

Abstract
In some classification problems the distribution of the test data is different from that of the training data because of external manipulations to the variables we observe. We propose a classification scheme which is robust to outside interventions by identifying causes in the training data, given that causes of a target variable remain predictive even when the data is manipulated. We do this be extending Relevance Learning Vector Quantization (RLVQ), a classification scheme that learns a relevance profile for the classification task presented. Our proposed algorithm, Causal-RLVQ, learns a relevance profile that weights causally relevant features more strongly. The algorithm can determine a tradeoff between robustness to intervention and accuracy on non-manipulated data, yielding RLVQ as a special case.

Manuscript from author [PDF]

ES2011-76

A constraint-based approach to incorporate prior knowledge in causal models

Giorgos Borboudakis, Sofia Triantafillou, Vincenzo Lagani, Ioannis Tsamardinos

Abstract
In this paper we address the problem of incorporating prior knowledge, in the form of causal relations, in causal models. We use the framework of Maximal Ancestral Graphs and use a constraint-based approach, allowing the incorporation of prior knowledge in a more flexible manner compared to previous approaches. Finally, we discuss the limitation of the used graphical models to capture causal knowledge, as well as open questions regarding this problem.

Manuscript from author [PDF]

[Back to Top]


Learning II


ES2011-23

Selecting from an infinite set of features in SVM

Remi Flamary, Florian Yger, Alain Rakotomamonjy

Abstract
Dealing with the continuous parameters of a feature extraction method has always been a difficult task that is usually solved by cross-validation. In this paper, we propose an active set algorithm for selecting automatically these parameters in a SVM classification context. Our experiments on texture recognition and BCI signal classification show that optimizing the feature parameters in a continuous space while learning the decision function yields to better performances than using fixed parameters obtained from a grid sampling.

Manuscript from author [PDF]

ES2011-51

Mutual information based feature selection for mixed data

Gauthier Doquire, Michel Verleysen

Abstract
The problem of feature selection is crucial for many applications and has thus been studied extensively. However, most of the existing methods are designed to handle data consisting only in categorical or in real-valued features while a mix of both kinds of features is often encoutered in practice. This paper proposes an approach based on mutual information and the maximal Relevance minimal Redundancy principle to handle the case of mixed data. It combines aspects of both wrapper and fi lter methods and is well suited for regression problems. Experiments on artifi cial and real-world datasets show the interest of the methodology.

Manuscript from author [PDF]

ES2011-91

Unsupervised feature selection for sparse data

Artur Ferreira, Mario Figueiredo

Abstract
Feature selection is a well-known problem in machine learning and pattern recognition. Many high-dimensional datasets are sparse, that is, many features have zero value. In some cases, we do not known the class label for some (or even all) patterns in the dataset, leading us to semi-supervised or unsupervised learning problems. For instance, in text classification with the bag-of-words (BoW) representations, there is usually a large number of features, many of which may be irrelevant (or even detrimental) for categorization tasks. In this paper, we propose one efficient unsupervised feature selection technique for sparse data, suitable for both standard floating point and binary features. The experimental results on standard datasets show that the proposed method yields efficient feature selection, reducing the number of features while simultaneously improving the classification accuracy.

Manuscript from author [PDF]

ES2011-53

Multi-class classification in the presence of labelling errors

Jakramate Bootkrajang, Ata Kaban

Abstract
Learning a classifier from a training set that contains labelling errors is a difficult, yet not very well studied problem. Here we present a model-based approach that extends multi-class quadratic normal discriminant analysis with a model of the mislabelling process. We demonstrate the benefits of this approach in terms of parameter recovery as well as improved classification performance, on both synthetic and real-world multi-class problems. We also obtain enhanced accuracy in comparison with a previous model-free approach.

Manuscript from author [PDF]

ES2011-28

Principal component analysis for unsupervised calibration of bio-inspired airflow array sensors

Michiel Van Dyck, Herbert Peremans

Abstract
This paper describes the automatic calibration of a set of air flow sensitive sensors on a robot exposed to unknown random air flow stimuli. This might support the idea that the cricket cercus neural system in the terminal abdominal ganglion is evolved by learning. The algorithm makes use of the singular value decomposition (SVD) and the known reduced model dimension of the system for learning the sensor array setup. The absolute orientation of the array can only be found in function of a reference flow or reference sensor which must be calibrated manually. When only a change in airflow measure is needed, the reference sensor can be left uncalibrated.

Manuscript from author [PDF]

ES2011-27

Effects of sparseness and randomness of pairwise distance matrix on t-SNE results

Eli Parviainen

Abstract
We apply ideas from random graph theory to sparse pairwise distance matrices in dimension reduction. We use matrices with some short and some randomly chosen distances, and study effects of matrix sparseness and randomness on trustworthiness and continuity of t-SNE visualizations. The existing works have either concentrated on matrices with only short distances, or implemented heuristics with mixed distances without explaining the effects. We find that trustworthiness generally increases with randomness, but not without limit. Continuity is less affected, but drops if matrices become too random. Sparseness has little effect on continuity, but decreases trustworthiness. Decrease in quality appears sublinear, which suggests that sparse t-SNE could be made subquadratic in complexity without too much effect on quality.

Manuscript from author [PDF]

ES2011-38

Nearest neighbors and correlation dimension for dimensionality estimation. Application to factor analysis of real biological time series data.

Jérôme Lapuyade-Lahorgue, Ali Mohammad-Djafari

Abstract
Determining the number of components in dimensionality reduction techniques is still one of the open problems of research on data analysis. These methods are often used in knowledge extraction of multivariate great dimensional data, but very often the number of components is assumed to be known. One of the classical methods to estimate this dimensionality is based on the Principal Components Analysis (PCA) eigenvalues. However, this method supposes that the model is linear and the signals are Gaussian. To be able to consider non-linear and non-Gaussian cases, we propose in this paper “measure based methods” as nearest neighbors dimension and correlation dimension. The comparaison between the three methods is evaluated both with simulated data and with real biological data, which are gene expression time series. The main goal of this study is to estimate the minimum number of factors.

Manuscript from author [PDF]

ES2011-61

A Similarity Function with Local Feature Weighting for Structured Data

Rubén Suárez, Rocío García-Durán, Fernando Fernández

Abstract
The application of learning approaches as Kernel or Instance Based methods to tree structured data requires the definition of similarity functions able to deal with such data. A new similarity function the for nearest prototype classification in relational data that follows a tree structure is defined in this paper. Its main characteristic is its capability to weight the importance of the different data features in different areas of the feature space. This work is built over two previous ideas:a similarity function for Local Feature Weighting (LFW), and a Relational Nearest Prototype Classification algorithm (RNPC).

Manuscript from author [PDF]

ES2011-83

Exploiting vertices states in GraphESN by weighted nearest neighbor

Claudio Gallicchio, Alessio Micheli

Abstract
Graph Echo State Networks (GraphESN) extend the Reservoir Computing approach to directly process graph structures. The reservoir is applied to every vertex of an input graph, realizing a contractive encoding process and resulting in a structured state isomorphic to the input. Whenever an unstructured output is required, a state mapping function maps the structured state into a fixed-size feature representation that feeds the linear readout. In this paper we propose an alternative approach, based on distance-weighted nearest neighbor, to realize a more flexible readout exploiting the state information computed for every vertex according to its individual relevance.

Manuscript from author [PDF]

ES2011-86

The role of Fisher information in primary data space for neighbourhood mapping

Héctor Ruiz, Ian Jarman, José D Martín, Paulo Lisboa

Abstract
Clustering methods and nearest neighbour classifiers typically compute distances between data points as a measure of similarity, with nearby pairs of points considered more like each other than remote pairs. The distance measure of choice is often Euclidean, implicitly treating all directions in space as equally relevant. This paper reviews the application of Fisher information to derive a metric in primary data space. The aim is to provide a natural coordinate space to represent pairwise distances with respect to a probability distribution p(c|x), defined by an external label c, and use it to compute more informative distances.

Manuscript from author [PDF]

ES2011-95

Communication Challenges in Cloud K-means

Fabrice Rossi, Matthieu Durut

Abstract
This paper studies how parallel machine learning algorithms can be implemented on top of Microsoft Windows Azure cloud computing platform. More specifically, the proposed algorithm focuses on communication mechanisms through the storage.

Manuscript from author [PDF]

ES2011-107

A Spectral Based Clustering Algorithm for Categorical Data with Maximum Modularity

Lazhar Labiod, Younès Bennani

Abstract
In this paper we propose a spectral based clustering algorithm to maximize an extended Modularity measure for categorical data; first, we establish the connection with the Relational Analysis criterion. Second, the maximization of the extended modularity is shown as a trace maximization problem. A spectral based algorithm is then presented to search for the partitions maximizing the extended Modularity criterion. Experimental results indicate that the new algorithm is efficient and effective at finding a good clustering across a variety of real world data sets

Manuscript from author [PDF]

ES2011-41

Single-trial P300 detection with Kalman filtering and SVMs

Lucie Daubigney, Olivier Pietquin

Abstract
Brain Computer Interfaces (BCI) are systems enabling humans to communicate with machines through signals generated by the brain. Several kinds of signals can be envisioned as well as means to measure them. In this paper we are particularly interested in even-related brain potentials (ERP) and especially visually-evoked potential signals (P300) measured with surface electroencephalograms (EEG). When the human is stimulated with visual inputs, the P300 signals arise about 300 ms after the visual stimulus has been received. Yet, the EEG signal is often very noisy which makes the P300 detection hard. It is customary to use an average of several trials to enhance the P300 signal and reduce the random noise but this results in a lower bit rate of the interface. In this contribution, we propose a novel approach to P300 detection using Kalman filtering and SVMs. Experiments show that this method is a promising step toward single-trial detection of P300.

Manuscript from author [PDF]

ES2011-35

Classifying mental states with machine learning algorithms using alpha activity decline

Carina Walter, Gabriele Cierniak, Peter Gerjets, Wolfgang Rosenstiel, Martin Bogdan

Abstract
This publication aims at developing computer based learning environments adapting to learners’ individual cognitive condition. The adaptive mechanism, based on Brain-Computer-Interface (BCI) methodology, relays on electroencephalogram (EEG)-data to diagnose learners’ mental states. A first within-subjects study (10 students) was accomplished aiming at differentiating between states of learning and non-learning by means of EEG-data. Support-Vector-Machines classified characteristics in the EEG-signals for these two different stimuli on average as 74.55% correct. For individual students the percentage of correct classification reached 92.22%. The results indicate that continuous EEG-data combined with BCI methodology is a promising approach to measuring learners’ mental states online.

Manuscript from author [PDF]

ES2011-30

Approaches for Automatic Speaker Recognition in a Binaural Humanoid Context

Karim Youssef, Bastien Breteau, Sylvain Argentieri, Jean-Luc Zarader, Zefeng Wang

Abstract
This paper presents two methods of Automatic Speaker Recognition (ASkR). ASkR has been largely studied in the last decades, but in most cases in mono-microphone or microphone array contexts. Our systems are placed in a binaural humanoid context where the signals captured by both ears of a humanoid robot will be exploited to perform the ASkR. Both methods use Mel-Frequency Cepstral Coding (MFCC), but one performs the classification with Predictive Neural Networks (PNN) and the other performs it with Gaussian Mixture Models (GMM). Tests are made on a database simulating the functioning of the human ears. They study the influence of noise, reverberations and speaker spatial position on the recognition rate.

Manuscript from author [PDF]

ES2011-37

Fast Data Mining with Sparse Chemical Graph Fingerprints by Estimating the Probability of Unique Patterns

Georg Hinselmann, Lars Rosenbaum, Andeas Jahn, Andreas Zell

Abstract
The aim of this work is to introduce a modification of chemical graphs fingerprints for data mining. The algorithm reduces the number of features by taking the probability of producing an unique feature at a specific search depth into account. We observed the probability of generating a non-unique feature depending on a search parameter (which leads to a power-law growths of features) and modeled it by a sigmoid function. This function was integrated into a fingerprinting routine to reduce the features according to their probability. The predictive performance was convincing with a considerable speedup for the training of a linear support vector machine for sparse instances.

Manuscript from author [PDF]

ES2011-85

Automatic Enhancement of Correspondence Detection in an Object Tracking System

Denis Schulze, Sven Wachsmuth, Katharina J. Rohlfing

Abstract
This paper proposes a strategy to automatically detect the correspondence between measurements of different sensors using object tracking. In addition the strategy includes the ability to learn new features to facilitate the correspondence computation for future measurements. Therefore first a correlation between objects of different modalities is computed using time synchronous changes of attribute values. Using statistical methods to determine the dependencies between changes of different attributes it is shown how a multi layer perceptron (MLP) can be used to enhance the correspondence detection in ambiguous situations.

Manuscript from author [PDF]

[Back to Top]


Sequence and time processing


ES2011-70

Time Experiencing by Robotic Agents

Michail Maniadakis, Marc Wittmann, Panos Trahanias

Abstract
Biological organisms perceive and act in the world based on spatiotemporal experiences and interpretations. However, artificial agents consider mainly the spatial relationships that exist in the world, typically ignoring its temporal aspects. In an attempt to direct research interest towards the fundamental issue of time experiencing, the current work explores two temporally different versions of a robotic rule switching task. An evolutionary process is employed to design a neural network controller capable of accomplishing both versions of the task. The systematic exploration of neural network dynamics revealed a self-organized time perception capacity in the agent's cognitive system that significantly facilitates the accomplishment of tasks, through the modulation of the supplementary behavioural and cognitive processes.

Manuscript from author [PDF]

ES2011-45

Visual place recognition using Bayesian filtering with Markov chains

Mathieu Dubois, Hervé Guillaume, Emmanuelle Frenoux, Philippe Tarroux

Abstract
We present a novel idea to use Bayesian filtering in the case of place recognition. More precisely, our system combines global image characterization, Learned Vector Quantization, Markov chains and Bayesian filtering. The goal is to integrate several images seen by a robot during exploration of the environment and the dependency between them. We present our system and the new Bayesian filtering algorithm. Our system has been evaluated on a standard database and shows promising results.

Manuscript from author [PDF]

ES2011-52

Iterative multi-task sequence labeling for predicting structural properties of proteins

Francis Maes, Julien Becker, Louis Wehenkel

Abstract
Developing computational tools for predicting protein structural information given their amino acid sequence is of primary importance in protein science. Problems, such as the prediction of secondary structures, of solvent accessibility, or of disordered regions, can be expressed as sequence labeling problems and could be solved independently by existing machine learning based sequence labeling approaches. But, since these problems are closely related, we propose to rather approach them jointly in a multi-task approach. To this end, we introduce a new generic framework for iterative multi-task sequence labeling. We apply this - conceptually simple but quite effective - strategy to jointly solve a set of five protein annotation tasks. Our empirical results with two protein datasets show that the proposed strategy significantly outperforms the single-task approaches.

Manuscript from author [PDF]

ES2011-54

Identification of sparse spatio-temporal features in Evoked Response Potentials

Nisrine Jrad, Marco Congedo

Abstract
Electroencephalographic Evoked Response Potentials (ERP)s exhibit distinct and individualized spatial and temporal characteristics. Identification of spatio-temporal features improves single-trial classification performance and allows a better understanding of the underlying physiology. This paper presents a method for analyzing the spatio-temporal characteristics associated with Error related Potentials (ErrP)s. First, a resampling procedure based on Global Field Power (GFP) extracts temporal features. Second, a spatially weighted SVM (sw-SVM) is proposed to learn a spatial filter optimizing the classification performance for each temporal feature. Third, the so obtained ensemble of sw-SVM classifiers are combined using a weighted combination of all sw-SVM outputs. Results indicate that inclusion of temporal features provides useful insight regarding the spatio-temporal characteristics of error potentials.

Manuscript from author [PDF]

ES2011-102

Hybrid HMM and HCRF model for sequence classification

Yann Soullard, thierry Artieres

Abstract
We propose a hybrid model combining a generative model and a discriminative model for signal labelling and classification tasks, aiming at taking the best from each world. The idea is to focus the learning of the discriminative model on most likely state sequences as output by the generative model. This allows taking advantage of the usual increased accuracy of generative models on small training datasets and of discriminative models on large training datasets. We instantiate this framework with Hidden Markov Models and Hidden Conditional Random Fields. We validate our model on financial time series and on handwriting data.

Manuscript from author [PDF]

ES2011-105

A Neural Filter for Electrolocation in Weakly Electric Fish

DaeEun Kim

Abstract
Weakly electric fish have the electric organ to generate the electric field and electrosensory mechanism to read the change of electric field with their electroreceptors. The electric organ produces a waveform of electric field as their electric organ discharge (EOD). Their active electrolocation system can detect distortion of the self-generated electric field, which is caused by a target object, and estimate the position of the target object. In this paper, we suggest a hypothesis that the periodic EOD signals can be involved to extract localization features from the noisy electrosensory signals and then provide a possible neural network to process noise-filtering to obtain the accurate information of a target position. The neural network has sinusoidal weights to process a time series of sensor readings for each electroreceptor and to obtain the original clean signal purely depending on object perturbation.

Manuscript from author [PDF]

[Back to Top]


Optimization and learning


ES2011-75

Non-linearly increasing resampling in racing algorithms

Verena Heidrich-Meisner, Christian Igel

Abstract
Racing algorithms are iterative methods trying to find the best among several options with high probability. The quality of each option is a random variable and is estimated by its empirical mean and concentration bounds obtained from repeated sampling. In each iteration of a standard racing algorithm each promising option is reevaluated once before being statistically compared with its competitors. We argue that Hoeffding and empirical Bernstein races benefit from decoupling the racing iteration from the number of samples per option and illustrate this on an artificial benchmark problem.

Manuscript from author [PDF]

ES2011-13

A distributed learning algorithm based on two-layer artificial neural networks and genetic algorithms

Diego Peteiro-Barral, Bertha Guijarro-Berdiñas, Beatriz Pérez-Sánchez, Oscar Fontenla-Romero

Abstract
In many real-world applications of machine learning, the amount of data is now beyond the capability of learning algorithms because they cannot process all available data in a reasonable time. Moreover, most large datasets are naturally distributed or they are being stored in a distributed manner. A promising line of research in order to deal with large and/or distributed data is distributed learning. We propose a new distributed learning algorithm based on two-layer artificial neural networks and genetic algorithms. The results obtained show that our method performs better than other distributed learning algorithms.

Manuscript from author [PDF]

[Back to Top]


Deep Learning


ES2011-4

An Introduction to Deep Learning

Ludovic Arnold, Sébastien Rebecchi, Sylvain Chevallier, Hélène Paugam-Moisy

Abstract
The deep learning paradigm tackles problems on which shallow architectures (e.g. SVM) are affected by the curse of dimensionality. As part of a two-stage learning scheme involving multiple layers of nonlinear processing a set of statistically robust features is automatically extracted from the data. The present tutorial introducing the ESANN deep learning special session details the state-of-the-art models and summarizes the current understanding of this learning approach that is a reference for many difficult classification tasks.

Manuscript from author [PDF]

ES2011-10

Using very deep autoencoders for content-based image retrieval

Alex Krizhevsky, Geoffrey Hinton

Abstract
We show how to learn many layers of features on color images and we use these features to initialize deep autoencoders. We then use the autoencoders to map images to short binary codes. Using semantic hashing, 28-bit codes can be used to retrieve images that are similar to a query image in a time that is independent of the size of the database. This extremely fast retrieval makes it possible to search using multiple different transformations of the query image. 256-bit binary codes allow much more accurate matching and can be used to prune the set of images found using the 28-bit codes.

Manuscript from author [PDF]

ES2011-79

Training RBMs based on the signs of the CD approximation of the log-likelihood derivatives

Asja Fischer, Christian Igel

Abstract
Contrastive Divergence (CD) learning is frequently applied to Restricted Boltzmann Machines (RBMs), the building blocks of deep believe networks. It relies on biased approximations of the log-likelihood gradient. This bias can deteriorate the learning process. It was claimed that the signs of most components of the CD update are equal to the corresponding signs of the log-likelihood gradient. This suggests using optimization techniques only depending on the signs. Resilient backpropagation is such a method and we combine it with CD learning. However, it does not prevent divergence caused by the approximation bias.

Manuscript from author [PDF]

ES2011-21

A supervised strategy for deep kernel machine

Florian Yger, Maxime Berar, Gilles Gasso, Alain Rakotomamonjy

Abstract
This paper presents an alternative approach for learning a Multilayer Kernel Machine (MKM) based on kernel partial least squares. After explaining the caracteristics and motivation of our approach, we show compelling evidences on real world applications.

Manuscript from author [PDF]

[Back to Top]