ESANN2015

23rd European Symposium on Artificial Neural Networks
Bruges, Belgium, April 22-23-24

[Electronic proceedings home page] [Electronic proceedings author index]

ESANN2015
Content of the proceedings

WARNING: you need Adobe Acrobat reader 7.0 or more to view the PDF files below



Prototype-based and weightless models


ES2015-68

Median-LVQ for classification of dissimilarity data based on ROC-optimization

David Nebel, Thomas Villmann

Abstract
In this article we consider a median variant of the learning vector quantization (LVQ) classifier for classification of dissimilarity data. However, beside the median aspect, we propose to optimize the receiver-operating characteristics (ROC) instead of the classification accuracy. In particular, we present a probabilistic LVQ model with an adaptation scheme based on a generalized Expectation-Maximization-procedure, which allows a maximization of the area under the ROC-curve for those dissimilarity data. The basic idea behind is the utilization of ordered pairs as a structured input for learning. The new scheme can be seen as a supplement to the recently introduced LVQ-scheme for ROC-optimization of vector data.

Manuscript from author [PDF]

ES2015-88

Certainty-based prototype insertion/deletion for classification with metric adaptation

Lydia Fischer, Barbara Hammer, Heiko Wersing

Abstract
We propose an extension of prototype-based classification models to automatically adjust model complexity, thus offering a powerful technique for online, incremental learning tasks. The incremental technique is based on the notion of the certainty of an observed classification. Unlike previous work, we can incorporate matrix learning into the framework by relying on the cost function of generalised learning vector quantisation (GLVQ) for prototype insertion, deletion, as well as training. In several benchmarks, we demonstrate that the proposed method provides comparable results to offline counterparts and an incremental support vector machine, while enabling a better control of the required memory.

Manuscript from author [PDF]

ES2015-35

Learning matrix quantization and variants of relevance learning

Kristin Domaschke, Marika Kaden, Mandy Lange, Thomas Villmann

Abstract
We propose an extension of the learning vector quantization framework for matrix data. Data in matrix form occur in several areas like gray-scale images, time dependent spectra or fMRI data. If the matrix data are vectorized, important spatial information may be lost. Thus, processing matrix data in matrix form seems to be more appropriate. However, it requires matrix dissimilarities for data comparison. Here Schatten-$p$-norms come into play. We show that they can be used in a natural way replacing the vector dissimilarities in the learning framework. Moreover, we transfer the concept of vector relevance learning also to this new matrix variant. We apply the resulting learning matrix quantization approach to the classification of time-dependent fluorescence spectra as an exemplary real world application.

Manuscript from author [PDF]

ES2015-26

A WiSARD-based multi-term memory framework for online tracking of objects

Daniel Nascimento, Rafael Carvalho, Félix Mora-Camino, Priscila Lima, Felipe França

Abstract
In this paper it is proposed a generic object tracker with real- time performance. The proposed tracker is inspired on the hierarchical short-term and medium-term memories for which patterns are stored as discriminators of a WiSARD weightless neural network. This approach is evaluated through benchmark video sequences published by Babenko et al. Experiments show that the WiSARD-based approach outperforms most of the previous results in the literature, with respect to the same dataset.

Manuscript from author [PDF]

ES2015-100

Memory Transfer in DRASiW–like Systems

De gregorio Massimo, Giordano Maurizio

Abstract
DRASiW is an extension of a Weightless NN model, namely WiSARD, with the capability of storing, in an internal data structure called “mental image” (MI), the frequencies of seen patterns during the training stage. Due to these capabilities together with the possibility to reversely process MIs to generate synthetic prototypes of train samples, in this paper we show how, in DRASiW–like systems, it is possible to transfer the memory between different systems preserving the functionalities.

Manuscript from author [PDF]

ES2015-73

Combining dissimilarity measures for prototype-based classification

Ernest Mwebaze, Gjalt Bearda, Michael Biehl, Dietlind Zuehlke

Abstract
Prototype-based classification has been used successfully for classification tasks where interpretability of the output of the system is key. Prototypes are representative of the data and, together with a suitable measure of dissimilarity, parameterize the classifier. In many practical problems, the same object is represented by a collection of qualitatively different subsets of features, each of which might require a different dissimilarity measure. In this paper we present a novel technique for combining different dissimilarity measures into one classification scheme for heterogeneous, mixed data. To illustrate the method we apply a select class of prototype-based classifiers, LVQ, to the problem of diagnosing viral crop disease in cassava plants. We combine different dissimilarity measures related to features extracted from leaf images including histograms (HSV) and shape features (SIFT). Our results show the feasibility of the method, increased performance compared to previous methods and improved interpretability of the systems.

Manuscript from author [PDF]

[Back to Top]


Emerging techniques and applications in multi-objective reinforcement learning


ES2015-15

Multi-objective optimization perspectives on reinforcement learning algorithms using reward vectors

Madalina Drugan

Abstract
Reinforcement learning is a machine learning area that studies which actions an agent can take in order to optimize a cumulative reward function. Recently, a new class of reinforcement learning algorithms with multiple, possibly conflicting, reward functions was proposed. We call this class of algorithms the multi-objective reinforcement learning (MORL) paradigm. We give an overview on multi-objective optimization techniques imported in MORL and their theoretical simplified variant with a single state, namely the multi-objective multi-armed bandits (MOMAB) paradigm.

Manuscript from author [PDF]

ES2015-27

Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem

Saba Yahyaa, Bernard Manderick

Abstract
The multi-objective, multi-armed bandit (MOMAB) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single scalar reward. Moreover, these multiple rewards might be conflicting. The MOMAB-problem has a set of Pareto optimal arms and an agent's goal is not only to find that set but also to play evenly or fairly the arms in that set. To find the Pareto optimal arms, linear scalarized function or Pareto dominance relations approach can be used. The linear scalarized function converts the multi-objective optimization problem into a single objective one and is very popular approach because of its simplicity. The Pareto dominance relations optimizes directly the multi-objective problem. In this paper, we extend Thompson Sampling policy to be used in the $MOMAB$ problem. We propose Pareto Thompson Sampling and linear scalarized Thompson Sampling approaches. We compare empirically between Pareto Thompson Sampling and linear scalarized Thompson Sampling on a test suite of MOMAB problems with Bernoulli distributions. Pareto Thompson Sampling is the approach with the best empirical performance.

Manuscript from author [PDF]

ES2015-65

Pareto Local Search for MOMDP Planning

Chiel Kooijman, Maarten De Waard, Maarten Inja, Diederik Roijers, Shimon Whiteson

Abstract
Standard single-objective methods such as dynamic programming are not applicable to Markov decision processes (MDPs) with multiple objectives because they depend on a maximization function over rewards, which is not defined if the rewards are multi-dimensional. As a result, special multi-objective algorithms are needed to find a set of policies that contains all optimal trade-offs between objectives, i.e. a set of Pareto-optimal policies. In this paper, we propose Pareto Local Policy Search (PLoPS), a new planning method for multi-objective MDPs (MOMDPs) based on Pareto Local Search (PLS). This method produces a good set of policies by iteratively scanning the neighbourhood of locally non-dominated policies for improvements. It is fast because neighbouring policies can be quickly identified as improvements, and their values can be computed incrementally. We test the performance of PLoPS on several MOMDP benchmarks, and compare it to popular decision-theoretic and evolutionary alternatives. The results indicate that PLoPS outperforms the alternatives.

Manuscript from author [PDF]

ES2015-33

Bernoulli bandits: an empirical comparison

Nixon Ronoh, Reuben Odoyo, Edna Milgo, Madalina Drugan, Bernard Manderick

Abstract
We compare empirically a representative sample of action selection policies on a test suite of Bernoulli multi-armed bandit problems. For such problems the rewards are either success or failure having a Bernoulli distribution with unknown success probability. The number of arms in our test suite ranges from small to large and for each number of arms we consider several distributions of the success probabilities. Our selection consists of the following action selection policies: ε-greedy, UCB1- Tuned, Thompson sampling, the Gittins index policy, and the knowledge gradient. In this paper, we report the case of ten arms. A forthcoming technical report will include other than Bernoulli bandits and it describes the experimental results for all multi-armed bandit problems for several parameter settings.

Manuscript from author [PDF]

[Back to Top]


Sequence learning and time series


ES2015-118

Learning Recurrent Dynamics using Differential Evolution

Sebastian Otte, Fabian Becker, Martin V. Butz, Marcus Liwicki, Andreas Zell

Abstract
This paper presents an efficient and powerful approach for learning dynamics with Recurrent Neural Networks (RNNs). No specialized or fine-tuned RNNs are used but rather standard RNNs with one fully connected hidden layer. The training procedure bases on a variant of Differential Evolution (DE) with a novel mutation schema that allows to reduce the population size in our setup down to five, but still yields very good results even within a few generations. For several common Multiple Superimposed Oscillator (MSO) instances new state-of-the-art results are presented, which are across the board multiple magnitudes better than the achieved results published so far. Furthermore, for new and even more difficult instances, i.e., MSO9-MSO12, our setup achieves lower error rates than reported previously for the best system on MSO8.

Manuscript from author [PDF]

ES2015-31

Comparison of Numerical Models and Statistical Learning for Wind Speed Prediction

Nils André Treiber, Stephan Späth, Justin Heinermann, Lueder von Bremen, Oliver Kramer

Abstract
After decades of dominating wind forecasts based on numerical weather predictions, statistical models gained attention for shortest-term forecast horizons in the recent past. A rigorous experimental comparison between both model types is rare. In this paper, we compare COSMO-DE EPS forecasts from the German Meteorological Service (DWD) post-processed with non-homogeneous Gaussian regression to a multivariate support vector regression model. Further, a hybrid model is introduced that employs a weighted prediction of both approaches.

Manuscript from author [PDF]

ES2015-39

Solar PV Power Forecasting Using Extreme Learning Machine and Information Fusion

Hélène Le Cadre, Ignacio Aravena, Anthony Papavasiliou

Abstract
We provide a learning algorithm combining distributed Extreme Learning Machine and an information fusion rule based on the aggregation of experts advice, to build day ahead probabilistic solar PV power production forecasts. These forecasts use, apart from the current day solar PV power production, local meteorological inputs, the most valu- able of which is shown to be precipitation. Experiments are then run in one French region, Provence-Alpes-Côte d'Azur, to evaluate the algorithm performance.

Manuscript from author [PDF]

ES2015-54

Gaussian process modelling of multiple short time series

Hande Topa, Antti Honkela

Abstract
We study effective Gaussian process (GP) modelling of multiple short time series. These problems are common for example when applying GP models independently to each gene in a gene expression time series data set. Such sets typically contain very few time points and hence naive application of common GP modelling techniques can lead to severe overfitting in a significant fraction of the fitted models, depending on the details of the data set. We propose avoiding overfitting by constraining the GP length-scale to values that are compatible with the spacing of the time points. We demonstrate that this eliminates otherwise serious overfitting in real experiment using GP model to rank SNPs based on their likelihood of being under natural selection.

Manuscript from author [PDF]

ES2015-56

Long Short Term Memory Networks for Anomaly Detection in Time Series

Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, Puneet Agarwal

Abstract
Long Short Term Memory (LSTM) networks have been demonstrated to be particularly useful for learning sequences containing longer term patterns of unknown length, due to their ability to maintain long term memory. Stacking recurrent hidden layers in such networks also enables the learning of higher level temporal features, for faster learning with sparser representations. In this paper, we use stacked LSTM networks for anomaly/fault detection in time series. A network is trained on non-anomalous data and used as a predictor over a number of time steps. The resulting prediction errors are modeled as a multivariate Gaussian distribution, which is used to assess the likelihood of anomalous behavior. The efficacy of this approach is demonstrated on four datasets: ECG, space shuttle, power demand, and multi-sensor engine dataset.

Manuscript from author [PDF]

ES2015-91

Hierarchical, prototype-based clustering of multiple time series with missing values

Pekka Wartiainen, Tommi Kärkkäinen

Abstract
A novel technique to divide a given set of multiple time series containing missing values into disjoint subsets is proposed. With the hierarchical approach that combines a robust clustering algorithm and multiple cluster indices, we are able to generate a dynamic decision tree like structure to represent the original data in the leaf nodes. The whole algorithm is first described and then experimented with one particular data set from the UCI repository, already used in [Kärkkäinen et al., 2014] for a similar exploration. The obtained results are very promising.

Manuscript from author [PDF]

[Back to Top]


Regression and prediction


ES2015-12

Fast greedy insertion and deletion in sparse Gaussian process regression

Jens Schreiter, Duy Nguyen-Tuong, Heiner Markert, Michael Hanselmann, Marc Toussaint

Abstract
In this paper, we introduce a new and straightforward criterion for successive insertion and deletion of training points in sparse Gaussian process regression. Our novel approach is based on an approximation of the selection technique proposed by Smola and Bartlett. It is shown that the resulting selection strategies are as fast as the purely randomized schemes for insertion and deletion of training points. Experiments on real-world robot data demonstrate that our obtained regression models are competitive with the computationally intensive state-of-the-art methods in terms of generalization accuracy.

Manuscript from author [PDF]

ES2015-77

Using self-organizing maps for regression: the importance of the output function

Thomas Hecht, Mathieu Lefort, Alexander Gepperth

Abstract
Self-organizing map (SOM) is a powerful paradigm that is extensively applied for clustering and visualization purpose. It is also used for regression learning, especially in robotics, thanks to its ability to provide a topological projection of high dimensional non linear data. In this case, data extracted from the SOM are usually restricted to the best matching unit (BMU), which is the usual way to use SOM for classification, where class labels are attached to individual neurons. In this article, we investigate the influence of considering more information from the SOM than just the BMU when performing regression. For this purpose, we quantitatively study several output functions for the SOM, when using these data as input of a linear regression, and find that the use of additional activities to the BMU can strongly improve regression performance. Thus, we propose an unified and generic framework that embraces a large spectrum of models from the traditional way to use SOM, with the best matching unit as output, to models related to the radial basis function network paradigm, when using local receptive field as output.

Manuscript from author [PDF]

ES2015-107

Using the Mean Absolute Percentage Error for Regression Models

Arnaud de Myttenaere, Boris Golden, Bénédicte Le Grand, Fabrice Rossi

Abstract
We study in this paper the consequences of using the Mean Absolute Percentage Error (MAPE) as a measure of quality for regression models. We show that finding the best model under the MAPE is equivalent to doing weighted Mean Absolute Error (MAE) regression. We show that universal consistency of Empirical Risk Minimization remains possible using the MAPE instead of the MAE.

Manuscript from author [PDF]

ES2015-81

Survival Analysis with Cox Regression and Random Non-linear Projections

Samuel Branders, Benoît Frénay, Pierre Dupont

Abstract
Proportional Cox hazard models are commonly used in survival analysis, since they define risk scores which can be directly interpreted in terms of hazards. Yet they cannot account for non-linearities in their covariates. This paper shows how to use random non-linear projections to efficiently address this limitation.

Manuscript from author [PDF]

ES2015-135

Ensemble Learning with Dynamic Ordered Pruning for Regression

Kaushala Dias, Terry Windeatt

Abstract
A novel method of introducing diversity into ensemble learning predictors for regression problems is presented. The proposed method prunes the ensemble while simultaneously training, as part of the same learning process. Here not all members of the ensemble are trained, but selectively trained, resulting in a diverse selection of ensemble members that have strengths in different parts of the training set. The result is that the prediction accuracy and generalization ability of the trained ensemble is enhanced. Pruning heuristics attempt to combine accurate yet complementary members; therefore this method enhances the performance by dynamically modifying the pruned aggregation through distributing the ensemble member selection over the entire dataset. A comparison is drawn with Negative Correlation Learning and a static ensemble pruning approach used in regression to highlight the performance improvement yielded by the dynamic method. Experimental comparison is made using Multiple Layer Perceptron predictors on benchmark datasets.

Manuscript from author [PDF]

ES2015-125

Training Multi-Layer Perceptron with Multi-Objective Optimization and Spherical Weights Representation

Honovan Rocha, Marcelo Costa, Antônio Braga

Abstract
This paper proposes a novel representation of the parameters of neural networks in which the weights are projected into a new space defined by a radius r and a vector of angles theta. This spherical representation further simplifies the multi-objective learning problem in which error and norm functions are optimized to generate Pareto sets. Using spherical weights the error is minimized using a mono objective problem to the vector of angles whereas the radius (or norm) is fixed. Results indicate that spherical weights generate more reliable and accurate Pareto set estimates as compared to standard multi-objective approach.

Manuscript from author [PDF]

ES2015-90

Reducing offline evaluation bias of collaborative filtering

Arnaud de Myttenaere, Boris Golden, Bénédicte Le Grand, Fabrice Rossi

Abstract
Recommendation systems have been integrated into the majority of large online systems to filter and rank information according to user profiles. This process influences the way users interact with the system and, as a consequence,bias the evaluation of a recommendation algorithm computed using historical data (via offline evaluation). This paper presents the state of the art of the solutions to reduce this bias and a new application for a collaborative filtering.

Manuscript from author [PDF]

ES2015-23

A new fuzzy neural system with applications

Yuanyuan Chai, Jun Chen, Wei Luo

Abstract
Through a comprehensive study of existing fuzzy neural systems, this paper presents a Choquet integral-OWA operator based fuzzy neural system named AggFNS as a new hybrid method of CI, which has advantages in universal fuzzy inference operators and importance factor expression during reasoning process. AggFNS was applied in traffic level of service evaluation problem and the experimental results showed that AggFNS has great nonlinear mapping function and approximation capability by training, which could be used for complex systems modeling, prediction and control.

Manuscript from author [PDF]

ES2015-126

Measuring scoring efficiency through goal expectancy estimation

Héctor Ruiz, Paulo Lisboa, Paul Neilson, Warren Gregson

Abstract
Association football is characterized by the lowest scoring rate of all major sports. A typical value of less than 3 goals per game makes it difficult to find strong effects on goal scoring. Instead of goals, one can focus on the production of shots, increasing the available sample size. However, the value of shots depends heavily on different factors, and it is important to take this variability into account. In this paper, we use a multilayer perceptron to build a goal expectancy model that estimates the conversion probability of shots, and use it to evaluate the scoring performance of Premier League footballers.

Manuscript from author [PDF]

ES2015-29

Predicting the profitability of agricultural enterprises in dairy farming

Maria Yli-Heikkilä, Jukka Tauriainen, Mika Sulkava

Abstract
Profitability and other economic aspects of agriculture can be analyzed using various machine learning methods. In this paper, we compare linear, additive and recursive partitioning -based models for predicting the profitability of farms using information easily available to a dairy farmer. We find that an ensemble of recursive partitioning methods provides the best prediction accuracy. We also analyze the importance of the predictor variables. These findings may turn out to be useful in increasing our understanding of the factors affecting farm profitability and developing a web-service for farmers to predict the performance of their own farm enterprise.

Manuscript from author [PDF]

ES2015-67

The use of RBF neural network to predict building’s corners hygrothermal behavior

Roberto Z. Freire, Gerson H. dos Santos, Leandro dos S. Coelho, Viviana C. Mariani, Divani da S. Carvalho

Abstract
In this paper, a radial basis function neural network (RBF-NN) was combined with two optimization techniques, the expectation-maximization clustering method was used to tune the Gaussian activation functions centers, and the differential evolution was adopted to optimize the spreads and to local search of the centers. The modified RBF-NN was employed to predict building corners hygrothermal behavior. These specific regions of buildings are still barely explored due to modelling complexity, high computer run time, numerical divergence and highly moisture-dependent properties. Moreover, these specific building areas are constantly affected by moisture accumulation and mould growth, conditions that favor structure damages.

Manuscript from author [PDF]

ES2015-2

I see you: on neural networks for indoor geolocation

Johannes Pohl, Andreas Noack

Abstract
We propose a new passive system for indoor localization of mobile nodes. After the setup, our system only relies on arbitrary wireless communication from the nodes, whereby neither the mobile nodes nor the communication needs to be under our control. The presented system is composed of three Artificial Neural Networks (ANN) using a radiomap approach and the Received Signal Strength (RSS) for localization. A Probabilistic Neural Network (PNN) decides between two Generalized Regression Neural Networks (GRNN) that process the actual RSS measurement. In practical experiments we achieve a mean location error of 0.58m which is 22.64% better than a single GRNN approach in our setup.

Manuscript from author [PDF]

[Back to Top]


Feature and kernel learning


ES2015-13

Feature and kernel learning

Veronica Bolon-Canedo, Michele Donini, Fabio Aiolli

Abstract
Feature selection and weighting has been an active research area in the last few decades finding success in many different applications. With the advent of Big Data, the adequate identification of the relevant features has converted feature selection in an even more indispensable step. On the other side, in kernel methods features are implicitly represented by means of feature mappings and kernels. It has been shown that the correct selection of the kernel is a crucial task, as long as an erroneous selection can lead to poor performance. Unfortunately, manually searching for an optimal kernel is a time-consuming and a sub-optimal choice. This tutorial is concerned with the use of data to learn features and kernels automatically. We provide a survey of recent methods developed for feature selection/learning and their application to real world problems, together with a review of the contributions to the ESANN 2015 special session on Feature and Kernel Learning.

Manuscript from author [PDF]

ES2015-52

Discovering temporally extended features for reinforcement learning in domains with delayed causalities

Robert Lieck, Marc Toussaint

Abstract
Discovering temporally delayed causalities from data raises notoriously hard problems in reinforcement learning. In this paper we define a space of temporally extended features, designed to capture such causal structures, using a generating operation. Our discovery algorithm PULSE exploits the generating operation to efficiently discover a sparse subset of features. We provide convergence guarantees and apply our method to train a model-based as well as a model-free agent in different domains. In terms of achieved rewards and the number of required features our methods can achieve much better results than other feature expansion methods.

Manuscript from author [PDF]

ES2015-104

ESNigma: efficient feature selection for echo state networks

Davide Bacciu, Filippo Benedetti, Alessio Micheli

Abstract
The paper introduces a feature selection wrapper designed specifically for Echo State Networks. It defines a feature scoring heuristics, applicable to generic subset search algorithms, which allows to reduce the need for model retraining with respect to wrappers in literature. The experimental assessment on real-word noisy sequential data shows that the proposed method can identify a compact set of relevant, highly predictive features with as little as $60\%$ of the time required by the original wrapper.

Manuscript from author [PDF]

ES2015-83

Learning features on tear film lipid layer classification

Beatriz Remeseiro, Veronica Bolon-Canedo, Amparo Alonso-Betanzos, Manuel G. Penedo

Abstract
Dry eye is a prevalent disease which leads to irritation of the ocular surface, and is associated with symptoms of discomfort and dryness. The Guillon tear film classification system is one of the most common procedures to diagnose this disease. Previous research has demonstrated that this classification can be automatized by means of image processing and machine learning techniques. However, all approaches for automatic classification have been focused on dark eyes, since they are most common in humans. This paper introduces a methodology making use of feature selection methods, to learn which features are the most relevant for each type of eyes and, thus, improving the automatic classification of the tear film lipid layer independently of the color of the eyes. Experimental results showed the adequacy of the proposed methodology, achieving classification rates over 90%, while producing unbiased results and working in real-time.

Manuscript from author [PDF]

ES2015-114

PCA-based algorithm for feature score measures ensemble construction

Andrey Filchenkov, Vladislav Dolganov, Ivan Smetannikov

Abstract
Feature filtering algorithms are commonly used in feature selection for high-dimensional datasets due to their simplicity and efficacy. Each of these algorithms has its own strengths and weaknesses. Ensemble of different ranking methods is a way to provide a stable and efficacious ranking algorithm. We propose a PCA-based algorithm for filter ranking algorithms ensemble. We compared this algorithm with four other rank aggregation algorithms on five different datasets used in the NIPS-2003 feature selection challenge. We evaluated the stability of the resulting rankings and the AUC score for four classifiers learnt on resulting feature sets. The proposed method has shown better stability and above-average efficacy.

Manuscript from author [PDF]

[Back to Top]


Graphs in machine learning


ES2015-14

Graphs in machine learning. An introduction

Pierre Latouche, Fabrice Rossi

Abstract

Manuscript from author [PDF]

ES2015-130

Exploiting the ODD framework to define a novel effective graph kernel

Giovanni Da San Martino, Nicolò Navarin, Alessandro Sperduti

Abstract
In this paper, we show how the Ordered Decomposition DAGs kernel framework, a framework that allows the definition of graph kernels from tree kernels, allows to easily define new state-of-the-art graph kernels. Here we consider a quite fast graph kernel based on the Subtree kernel (ST), and we improve it by increasing its expressivity by adding new features involving partial tree features. While the worst-case complexity of the new obtained graph kernel does not increase, its effectiveness is improved, as shown on several chemical datasets, reaching state-of-the-art performances.

Manuscript from author [PDF]

ES2015-106

Exact ICL maximization in a non-stationary time extension of latent block model for dynamic networks

Marco Corneli, Pierre Latouche, Fabrice Rossi

Abstract
The latent block model (LBM) is a powerful probabilistic tool to describe interactions between node sets in bipartite networks, but it does not account for interactions of time varying intensity between nodes in unknown classes. Here we propose a non stationary temporal extension of the LBM that clusters simultaneously the two node sets of a bipartite network and constructs classes of time intervals on which interactions are stationary. The number of clusters as well as the membership to classes are obtained by maximizing the exact complete-data integrated likelihood by means of a greedy search approach. Experiments on simulated and real data illustrate the potentialities of such a model.

Manuscript from author [PDF]

ES2015-87

A State-Space Model for the Dynamic Random Subgraph Model

RAwyia zreik, Pierre Latouche, Charles Bouveyron

Abstract
In recent years, many random graph models have been proposed to extract information from networks. The principle is to look for groups of vertices with homogenous connection profiles. Most of these models are suitable for static networks and can handle different types of edges. This work is motivated by the need of analyzing an evolving network describing email communications between employees of the Enron compagny where social positions play an important role. Therefore, in this paper, we consider the random subgraph model (RSM) which was proposed recently to model networks through latent clusters built within known partitions. Using a state space model to characterize the cluster proportions, RSM is then extended in order to deal with dynamic networks. We call the latter the dynamic random subgraph model (dRSM).

Manuscript from author [PDF]

ES2015-132

Gabriel Graph for Dataset Structure and Large Margin Classification: A Bayesian Approach

Luiz Carlos Torres, Cristiano Castro, Antônio Braga

Abstract
This paper presents a geometrical approach for obtaining large margin classifiers. The method aims at exploring the geometrical properties of the dataset from the structure of a Gabriel graph, which represents pattern relations according to a given distance metric, such as the Euclidean distance. Once the graph is generated, geometric vectors, analogous to SVM's support vectors are obtained in order to yield the final large margin solution from a Gaussian mixture model approach. Preliminary experiments have shown that the solutions obtained with the proposed method are close to those obtained with SVMs.

Manuscript from author [PDF]

[Back to Top]


Manifold learning and optimization


ES2015-109

Supervised Manifold Learning with Incremental Stochastic Embeddings

Oliver Kramer

Abstract
[Comment: paper may also fit into Session: "Unsupervised nonlinear dimensionality reduction?"] In this paper, we introduce an incremental dimensionality reduction approach for labeled data. The algorithm incrementally samples in latent space and chooses a solution that minimizes the nearest neighbor classification error taking into account label information. We introduce and compare two optimization approaches to generate supervised embeddings, i.e., an incremental solution construction method and a re-embedding approach. Both methods have in common that the objective is to minimize the nearest neighbor classification error computed in the low-dimensional space. The resulting embedding is a surrogate of the high-dimensional labeled set. The set allows conclusions about the data set structure and can be used as preprocessing step for classification of labeled patterns.

Manuscript from author [PDF]

ES2015-99

Rank-constrained optimization: a Riemannian manifold approach

Guifang Zhou, Wen Huang, Gallivan Kyle, Van Dooren Paul, Pierre-Antoine Absil

Abstract
This paper presents an algorithm that solves optimization problems on a matrix manifold $mathcal{M} subseteq mathbb{R}^{m times n}$ with an additional rank inequality constraint. New geometric objects are defined to facilitate efficiently finding a suitable rank. The convergence properties of the algorithm are given and a weighted low-rank approximation problem is used to illustrate the efficiency and effectiveness of the algorithm.

Manuscript from author [PDF]

ES2015-131

Asynchronous decentralized convex optimization through short-term gradient averaging

Jérôme FELLUS, David Picard, Philippe-Henri Gosselin

Abstract
This paper considers decentralized convex optimization over a network in large scale contexts, where large simultaneously applies to number of training examples, dimensionality and number of networking nodes. We first propose a centralized optimization scheme that generalizes successful existing methods based on gradient averaging, improving their flexibility by making the number of averaged gradients an explicit parameter of the method. We then propose an asynchronous distributed algorithm that implements this original scheme for large decentralized computing networks.

Manuscript from author [PDF]

[Back to Top]


Feature and model selection, sparse models


ES2015-50

Model Selection for Big Data: Algorithmic Stability and Bag of Little Bootstraps on GPUs

Luca Oneto, Bernardo Pilarz, Alessandro Ghio, Davide Anguita

Abstract
Model selection is a key step in learning from data, because it allows to select optimal models, by avoiding both under- and over-fitting. However, in the Big Data framework, the effectiveness of a model selection approach is assessed not only through the accuracy of the learned model but also through the time and computational resources needed to complete the procedure. In this paper, we propose two model selection approaches for Least Squares Support Vector Machine (LS-SVM) classifiers, based on Fully-empirical Algorithmic Stability (FAS) and Bag of Little Bootstraps (BLB). The two methods scale sub-linearly respect to the size of the learning set and, therefore, are well suited for big data applications. Experiments are performed on a Graphical Processing Unit (GPU), showing up to 30x speed-ups with respect to conventional CPU-based implementations.

Manuscript from author [PDF]

ES2015-95

Solving constrained Lasso and Elastic Net using nu-SVMs

Carlos M. Alaíz, Alberto Torres, José R. Dorronsoro

Abstract
Many important linear sparse models have at its core the Lasso problem, for which the GLMNet algorithm is often considered as the current state of the art. Recently M. Jaggi has observed that Constrained Lasso (CL) can be reduced to a SVM-like problem, which opens the way to use efficient SVM algorithms to solve CL. We will refine Jaggi's arguments to reduce CL as well as constrained Elastic Net to a Nearest Point Problem and show experimentally that the well known LIBSVM library results in a faster convergence than GLMNet for small problems and also, if properly adapted, for larger ones.

Manuscript from author [PDF]

ES2015-10

Assessment of feature saliency of MLP using analytic sensitivity

Tommi Kärkkäinen

Abstract
A novel technique to determine the saliency of features for the multilayer perceptron (MLP) neural network is presented. It is based on the analytic derivative of the feedforward mapping with respect to inputs, which is then integrated over the training data using the mean of the absolute values. Experiments demonstrating the viability of the approach are given with small benchmark data sets. The cross-validation based framework for reliable determination of MLP that has been used in the experiments was introduced in Kärkkäinen et al. (ESANN 2014, pp. 213-218) and Kärkkäinen (LNCS 8621, pp. 291-300).

Manuscript from author [PDF]

ES2015-41

Morisita-based feature selection for regression problems

Jean Golay, Michael Leuenberger, Mikhaïl Kanevski

Abstract
Data acquisition, storage and management have been improved, while the factors of many phenomena are not well known. Consequently, irrelevant and redundant features artificially increase the size of datasets, which complicate learning tasks, such as regression. To address this problem, feature selection methods have been proposed. This research introduces a new supervised filter based on the Morisita estimator of intrinsic dimension. The algorithm is simple and does not rely on arbitrary parameters. It is applied to both synthetic and real data and a comparison with a wrapper based on extreme learning machine is conducted.

Manuscript from author [PDF]

ES2015-48

A new genetic algorithm for multi-label correlation-based feature selection

Suwimol Jungjit, Alex Freitas

Abstract
This paper proposes a new Genetic Algorithm for Multi-Label Correlation-Based Feature Selection (GA-ML-CFS). This GA performs a global search in the space of candidate feature subsets, in order to select a high-quality feature subset that is used by a multi-label classification algorithm – in this work, the Multi-Label k-NN algorithm. We compare the results of GA-ML-CFS with the results of the previously proposed Hill-Climbing for Multi-Label Correlation-Based Feature Selection (HC-ML-CFS), across 10 multi-label datasets

Manuscript from author [PDF]

ES2015-102

Search Strategies for Binary Feature Selection for a Naive Bayes Classifier

Tsirizo Rabenoro, Jérôme Lacaille, Marie Cottrell, Fabrice Rossi

Abstract
We compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability outperform filter approaches while retaining a reasonable computational cost.

Manuscript from author [PDF]

[Back to Top]


Advances in learning analytics and educational data mining


ES2015-18

Advances in learning analytics and educational data mining

Mehrnoosh Vahdat, Alessandro Ghio, Luca Oneto, Davide Anguita, Mathias Funk, Matthias Rauterberg

Abstract
The growing interest in recent years towards Learning Analytics (LA) and Educational Data Mining (EDM) has enabled novel approaches and advancements in educational settings. The wide variety of research and practice in this context has enforced important possibilities and applications from adaptation and personalization of Technology Enhanced Learning (TEL) systems to improvement of instructional design and pedagogy choices based on students needs. LA and EDM play an important role in enhancing learning processes by offering innovative methods of development and integration of more personalized, adaptive, and interactive educational environments. This has motivated the organization of the ESANN 2015 Special Session in Advances in Learning Analytics and Educational Data Mining. Here, a review of research and practice in LA and EDM is presented accompanied by the most central methods, benefits, and challenges of the field. Additionally, this paper covers a review of novel contributions into the Special Session.

Manuscript from author [PDF]

ES2015-43

Adaptive structure metrics for automated feedback provision in Java programming

Benjamin Paassen, Bassam Mokbel, Barbara Hammer

Abstract
Today's learning supporting systems for programming mostly rely on pre-coded feedback provision, such that their applicability is restricted to modelled tasks. In this contribution, we investigate the suitability of machine learning techniques to automate this process by means of a presentation of similar solution strategies from a set of stored examples. To this end we apply structure metric learning methods in local and global alignment which can be used to compare Java programs. We demonstrate that automatically adapted metrics better identify the underlying programming strategy as compared to their default counterparts in a benchmark example from programming.

Manuscript from author [PDF]

ES2015-49

Human Algorithmic Stability and Human Rademacher Complexity

Mehrnoosh Vahdat, Luca Oneto, Alessandro Ghio, Davide Anguita, Mathias Funk, Matthias Rauterberg

Abstract
In Machine Learning (ML), the learning process of an algorithm given a set of evidences is studied via complexity measures. The way towards using ML complexity measures in the Human Learning (HL) domain has been paved by a previous study, which introduced Human Rademacher Complexity (HRC): in this work, we introduce Human Algorithmic Stability (HAS). Exploratory experiments, performed on a group of students, show the superiority of HAS against HRC, since HAS allows grasping the nature and complexity of the task to learn.

Manuscript from author [PDF]

ES2015-86

High-School Dropout Prediction Using Machine Learning: A Danish Large-scale Study

Nicolae-Bogdan Sara, Rasmus Halland, Christian Igel, Stephen Alstrup

Abstract
Pupils not finishing their secondary education are a big societal problem. Previous studies indicate that machine learning can be used to predict high-school dropout, which allows early interventions. To the best of our knowledge, this paper presents the first large-scale study of that kind. It considers pupils that were at least six months into their Danish high-school education, with the goal to predict dropout in the subsequent three months. We combined information from the MaCom Lectio study administration system, which is used by most Danish high schools, with data from public online sources (name database, travel planner, governmental statistics). In contrast to existing studies that were based on only a few hundred students, we considered a considerably larger sample of 36299 pupils for training and 36299 for testing. We evaluated different machine learning methods. A random forest classifier achieved an accuracy of 93.47% and an area under the curve of 0.965. Given the large sample, we conclude that machine learning can be used to reliably detect high-school dropout given the information already available to many schools.

Manuscript from author [PDF]

ES2015-22

The prediction of learning performance using features of note taking activities

Minoru Nakayama, Kouichi Mutsuura, Hiroh Yamamoto

Abstract
To promote effective learning in online learning environments, the prediction of learning performance is necessary, using various features of learning behaviour. In a blended learning course, participant's note taking activity reflects learning performance, and the possibility of predicting performance in final exams is examined using metrics of participant's characteristics and features of the contents of notes taken during the course. According to the results of this prediction performance, features of note-taking activities are a significant source of information to predict the score of final exams. Also, the accuracy of this prediction was evaluated using factors of the feature extraction procedure and the course instructions.

Manuscript from author [PDF]

ES2015-113

Enhancing learning at work. How to combine theoretical and data-driven approaches, and multiple levels of data?

Virpi Kalakoski, Henriikka Ratilainen, Linda Drupsteen

Abstract
This research plan focuses on learning at work. Our aim is to gather empirical data on multiple factors that can affect learning for work, and to apply computational methods in order to understand the preconditions of effective learning. The design will systematically combine theory- and data-driven approaches to study (i) whether principles of effective learning found in previous studies apply to real life settings, (ii) what interactions between individual and organizational factors are related to learning outcomes, and (iii) new connections and phenomena relevant to enhance learning in real life.

Manuscript from author [PDF]

ES2015-24

Weighted Clustering of Sparse Educational Data

Mirka Saarela, Tommi Kärkkäinen

Abstract
Clustering as an unsupervised technique is predominantly used in unweighted settings. In this paper, we present an efficient version of a robust clustering algorithm for sparse educational data that takes the weights, aligning a sample with the corresponding population, into account. The algorithm is utilized to divide the Finnish student population of PISA 2012 (the latest data from the Programme for International Student Assessment) into groups, according to their attitudes and perceptions towards mathematics, for which one third of the data is missing. Furthermore, necessary modifications of three cluster indices to reveal an appropriate number of groups are proposed and demonstrated.

Manuscript from author [PDF]

[Back to Top]


Classification


ES2015-32

An affinity matrix approach for structure selection of extreme learning machines

David Pinto, Andre Lemos, Antônio Braga

Abstract
This paper proposes a novel pruning approach for Extreme Learning Machines. Hidden neurons ranking and selection are performed using a priori information expressed by affinity matrices. We show that the similarity between the affinity matrix of the input patterns and the affinity matrix of the hidden layer output patterns can be seen as a measure of the data structural retention through the network. However, from a certain similarity level, adding new hidden nodes will have small or no effect on the amount of information propagated from the input. The proposed approach automatically determines this level and hence the suitable number of hidden nodes. Experiments are performed using classification problems to validate the proposed approach.

Manuscript from author [PDF]

ES2015-80

A generalised label noise model for classification

Jakramate Bootkrajang

Abstract
Learning from labelled data is becoming more and more challenging due to inherent imperfection of training labels. In this paper, we propose a new, generalised label noise model which is able to withstand the negative effect of both random noise and a wide range of non-random label noises. Empirical studies using three real-world datasets with inherent annotation errors demonstrate that the proposed generalised label noise model improves, in terms of classification accuracy, over existing label noise modelling approaches.

Manuscript from author [PDF]

ES2015-84

On the use of machine learning techniques for the analysis of spontaneous reactions in automated hearing assessment

Veronica Bolon-Canedo, Alba Fernández, Amparo Alonso-Betanzos, Marcos Ortega, Manuel G. Penedo

Abstract
Lack of hearing is one of the most frequent sensory deficits among elder population. Its correct assessment becomes complicated for audiologists when there are severe difficulties in the communication with the patient. Trying to facilitate this task, this paper proposes a methodology for the correct classification of eye gestural reactions to the auditory stimuli by using machine learning approaches. After extracting the features from the existing videos, we applied several classifiers and managed to improve the detection of the most important classes through the use of oversampling techniques in a novel way. This methodology showed promising results, with true positive rates over 0.96 for the critical classes and global classification rates over 97%, paving the way to its inclusion in a fully automated tool.

Manuscript from author [PDF]

ES2015-120

Combining higher-order N-grams and intelligent sample selection to improve language modeling for Handwritten Text Recognition

Jafar Tanha, Jesse De Does, Katrien Depuydt

Abstract
We combine two techniques to improve the language mod- eling component of a Handwritten Text Recognition (HTR) system. On the one hand, we apply a previously developed intelligent sample selection approach to language model adaptation for handwritten text recognition, which exploits a combination of in-domain and out-of-domain data for construction of language models. On the other hand, we apply rescoring methods to enable more complex language modeling in HTR. It is shown that these techniques complement each other very well, and that the combination leads to a significant error reduction in a practical HTR task for historical data.

Manuscript from author [PDF]

ES2015-40

Learning Sparse Feature Representations using Probabilistic Quadtrees and Deep Belief Nets

Saikat Basu, Manohar Karki, Sangram Ganguly, Robert DiBiano, Supratik Mukhopadhyay, Ramakrishna Nemani

Abstract
Learning sparse feature representations is a useful instrument for solving an unsupervised learning problem. In this paper, we present three labeled handwritten digit datasets, collectively called n-MNIST. Then, we propose a novel framework for the classification of handwritten digits that learns sparse representations using probabilistic quadtrees and Deep Belief Nets. On the MNIST and n-MNIST datasets, our framework shows promising results and significantly outperforms traditional Deep Belief Networks.

Manuscript from author [PDF]

ES2015-61

Optimal transport for semi-supervised domain adaptation

Denis Rousselle, Stéphane Canu

Abstract
Domain adaption for semi-supervised learning is still a challenging task. Indeed, available solutions are often slow and fail to provide relevant interpretations. Here we propose a new algorithm to solve this problem of semi-supervised domain adaptation efficiently, by using an adapted combination of transportation algorithms. Our empirical evidence supports our initial intuition showing the interest of the proposed method.

Manuscript from author [PDF]

ES2015-46

Resource-efficient Incremental learning in very high dimensions

Alexander Gepperth, Mathieu Lefort, Thomas Hecht

Abstract
We propose a three-layer neural architecture for incremental multi-class learning that remains resource-efficient even when the number of input dimensions is very high ($\ge 1000$). This so-called projection-prediction (PROPRE) architecture is strongly inspired by biological information processing in that it uses a prototype-based, topologically organized hidden layers trained with the SOM learning rule controlled by a global, task-related error signal. Furthermore, the SOM learning adapts only the weights of localized neural sub-populations that are similar to the input, which explicitly avoids the catastrophic forgetting effect of MLPs in case new input statistics are presented to the architecture. As the readout layer uses simple linear regression, the approach essentially applies locally linear models to "receptive fields" (RF) defined by SOM prototypes, whereas RF shape is implicitly defined by adjacent prototypes (which avoids the storage of covariance matrices that gets prohibitive for high input dimensionality). Both RF centers and shapes are jointly adapted w.r.t. input statistics and the classification task. Tests on the MNIST dataset show that the algorithm achieves compares favorably compared to the state-of-the-art LWPR algorithm at vastly decreased resource requirements.

Manuscript from author [PDF]

ES2015-5

One-vs-all binarization technique in the context of random forest

Md Nasim Adnan, Md Zahidul Islam

Abstract
Binarization techniques are widely used to solve multi-class classification problems. These techniques reduce the classification complexity of multi-class classification problems by dividing the original data set into two-class segments or replicas. Then a set of simpler classifiers are learnt from the two-class segments or replicas. The outputs from these classifiers are combined for final classification. Binarization can improve prediction accuracy when compared to a single classifier. However, to be declared as a superior technique, binarization techniques need to prove themselves in the context of ensemble classifiers such as Random Forest. Random Forest is a state-of-the-art popular decision forest building algorithm which focuses on generating diverse decision trees as the base classifiers. In this paper we evaluate one-vs-all binarization technique in the context of Random Forest. We present an elaborate experimental result involving ten widely used data sets from the UCI Machine Learning Repository. The experimental results exhibit the effectiveness of one-vs-all binarization technique in the context of Random Forest.

Manuscript from author [PDF]

ES2015-21

Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets

Md Nasim Adnan, Md Zahidul Islam

Abstract
The Random Forest algorithm generates quite diverse decision trees as the base classifiers for high dimensional data sets. However, for low dimensional data sets the diversity among the trees falls sharply. In Random Forest, the size of the bootstrap samples generally remains the same every time to generate a decision tree as the base classifier. In this paper we propose to vary the size of the bootstrap samples randomly within a predefined range in order to increase diversity among the trees. We conduct an elaborate experimentation on several low dimensional data sets from UCI Machine Learning Repository. The experimental results show the effectiveness of our proposed technique.

Manuscript from author [PDF]

ES2015-112

An Ensemble Learning Technique for Multipartite Ranking

Stéphan Clémençon, Sylvain Robbiano

Abstract
Decision tree induction algorithms, possibly combined with a consensus technique, have been recently successfully extended to multipartite ranking. It is the goal of this paper to address certain aspects of their weakness, instability and lack of smoothness namely, by proposing dedicated ensemble learning strategies. A shown by numerical experiments, bootstrap aggregation combined with a certain amount of feature randomization dramatically improve performance of such ranking methods, in terms of accuracy and robustness both at the same time.

Manuscript from author [PDF]

ES2015-82

Online multiclass learning with "bandit" feedback under a Passive-Aggressive approach

Hongliang Zhong, Emmanuel Daucé, Liva Ralaivola

Abstract
This paper presents a new approach to online multi-class learning with bandit feedback. This algorithm, named PAB (Passive Agressive in Bandit) is a variant of Online Passive-Aggressive Algorithm proposed by [Crammer, 2006], the latter being an effective framework for performing max-margin online learning. We analyze some of its operating principal, and show it to provide a good and scalable solution to the bandit classification problem, in particular in the case of a real-world dataset where it is found to outperform the best existing methods.

Manuscript from author [PDF]

ES2015-85

Data Analytics for Drilling Operational States Classifications

Galina Veres, Zoheir Sabeur

Abstract
This paper provides benchmarks for the identification of best performance classifiers for the detection of operational states in industrial drilling operations. Multiple scenarios for the detection of the operational states are tested on a rig with various drilling wells. Drilling data are extremely challenging due to their non-linear and stochastic natures, notwithstanding the embedded noise in them and unbalancing. Nevertheless, there is a possibility to deploy robust classifiers to overcome such challenges and achieve good automated detection of states. Three classifiers with best classification rates of drilling operational states were identified in this study.

Manuscript from author [PDF]

ES2015-79

Prediction of concrete carbonation depth using decision trees

Woubishet Taffese, Esko Sistonen , Jari Puttonen

Abstract
In this paper, three carbonation depth predicting models using decision tree approach are developed. Carbonation, in urban area, is the major causes of reinforcement steel corrosion that causes premature degradation, loss of serviceability and safety of reinforced concrete structures. The adopted decision trees are regression tree, bagged ensemble and reduced bagged ensemble regression tree. The evaluation of the models predictions performance reveals that all three models perform reasonably well. Among the models, reduced bagged ensemble regression tree has a highest prediction and generalization capability.

Manuscript from author [PDF]

ES2015-66

Powered-Two-Wheeler safety critical events recognition using a mixture model with quadratic logistic functions

Ferhat ATTAL, Abderrahmane Boubezoul, Allou Samé, Latifa Oukhellou

Abstract
This paper presents a simple and efficient methodology that uses both acceleration and angular velocity signals to detect critical safety events for Powered Two Wheelers (PTW). The problem of recognition of critical events has been performed with the help of two steps: (1) the feature extraction step, where the multidimensional time trajectories of accelerometer/gyroscope data were modelled and segmented by using a specific mixture model with quadratic logistic functions; (2) the classifica- tion step, which consists in using the k-nearest neighbor (k-NN) algorithm in order to assign each trajectory characterized by its extracted features to one of the three classes namely Fall, near Fall and Naturalistic riding. The results show the ability of the proposed methodology to detect critical safety events for Powered Two Wheelers (PTW).

Manuscript from author [PDF]

[Back to Top]


Image processing and vision systems


ES2015-76

Real-time activity recognition via deep learning of motion features

Kishore Konda, Pramod Chandrashekhariah, Roland Memisevic, Jochen Triesch

Abstract
Activity recognition is a challenging computer vision problem with countless applications. Here we present a real time activity recognition system using deep learning of local motion feature representations. Our approach learns to directly extract energy based motion features from video blocks. We implement the system on a distributed computing architecture and evaluate its performance on the iCub humanoid robot. We demonstrate real time performance using GPUs, paving the way for wide deployment of activity recognition systems in real world scenarios.

Manuscript from author [PDF]

ES2015-115

Designing semantic feature spaces for brain-reading

Luepol Pipanmaekaporn, Ludmilla Tajtelbom, Vincent Guigue, thierry Artieres

Abstract
We focus on a brain-reading task which consists in discovering the word a person is thinking of from an fMRI image of his brain. Previous studies have demonstrated the feasibility of this brain-reading task through the design of what has been called a semantic space, i.e. a continuous low dimensional space reflecting the similarity between words. Up to now better results are achieved when carefully designing the semantic space by hand, which limits the generality of the method. We propose to automatically design several semantic space from linguistic resources and to combine them in a principled way so as to reach results as accurate as when using a manually built semantic space.

Manuscript from author [PDF]

ES2015-124

Learning objects from RGB-D sensors using point cloud-based neural networks

Marcelo Borghetti Soares, Pablo Barros, German Ignacio Parisi, Stefan Wermter

Abstract
In this paper we present a scene understanding approach for assistive robotics based on learning to recognize different objects from RGB-D devices. Using the depth information it is possible to compute descriptors that capture the geometrical relations among the points that constitute an object or extract features from multiple viewpoints. We developed a framework for testing different neural models that receive this depth information as input. Also, we propose a novel approach using three-dimensional RGB-D information as input to Convolutional Neural Networks. We found F1-scores greater than 0.9 for the majority of the objects tested, showing that the adopted approach is effective as well for classification.

Manuscript from author [PDF]

ES2015-116

A Robust Neural Robot Navigation Using a Combination of Deliberative and Reactive Control Architectures

Dalia Marcela Rojas Castro, Arnaud Revel, Michel Ménard

Abstract
This paper proposes a hybrid neural-based control architecture for robot indoor navigation. This architecture preserves all the advantages of reactive architectures such as rapid responses to unforeseen problems in dynamic environments while combining them with the global knowledge of the world used in deliberative architectures. In order to take the right decision during navigation, the reactive module allows the robot to corroborate the dynamic visual perception with the a priori knowledge of the world gathered from a previously examined floor plan. Experiments with the robot functioning based on the proposed architecture in a simple navigation scenario prove the feasibility of the approach.

Manuscript from author [PDF]

ES2015-122

Robust Visual Terrain Classification with Recurrent Neural Networks

Sebastian Otte, Stefan Laible, Richard Hanten, Marcus Liwicki, Andreas Zell

Abstract
A novel approach for robust visual terrain classification by generating feature sequences on repeatedly mutated image patches is presented. These sequences providing the feature vector progress under a certain image operation are learned with Recurrent Neural Networks (RNNs). The approach is studied for image patch based terrain classification for wheeled robots. Thereby, various RNN architectures, namely, standard RNNs, Long Short Term Memory networks (LSTMs), Dynamic Cortex Memory networks (DCMs) as well as bidirectional variants of the mentioned architecture are investigated and compared to recently used state-of-the-art methods for real-time terrain classification. The results show that the presented approach outperforms previous methods significantly.

Manuscript from author [PDF]

ES2015-89

Revisiting ant colony algorithms to seismic faults detection

Walther Maciel, Cristina Vasconcelos, Pedro Silva, Marcelo Gattass

Abstract
Seismic fault extracting is a time consuming task that can be aided by image enhancement of fault areas. The recent literature address this task by using ant colony optimization (ACO) algorithms to highlight the fault edges. This work proposes improvements to current state of the art methodologies by revisiting and/or reincorporating classic aspects of ACO, such as ant distribution, pheromone evaporation and deposition, not previously considered in this seismic fault enhancement scenario.The proposed approach arrives at good results presenting images with little noise and great localization of fault edges.

Manuscript from author [PDF]

ES2015-101

Depth and height aware semantic RGB-D perception with convolutional neural networks

Hannes Schulz, Nico Höft, Sven Behnke

Abstract
Convolutional neural networks are popular for image labeling tasks, because of built-in translation invariance. They do not adopt well to scale changes, however, and cannot easily adjust to classes which regularly appear in certain scene regions. This is especially true when the network is applied in a sliding window. When depth data is available, we can address both problems. We propose to adjust the size of processed windows to the depth and to supply inferred height above ground to the network, which significantly improves object-class segmentation results on the NYU depth dataset.

Manuscript from author [PDF]

ES2015-136

A simple technique for improving multi-class classification with neural networks

Thomas Kopinski, Alexander Gepperth, Uwe Handmann

Abstract
We present a novel method to perform multi-class pattern classification with neural networks and test it on a challenging 3D hand gesture recognition problem. Our method consists of a standard one-against-all (OAA) classification, followed by another network layer classifying the resulting class scores, possibly augmented by the original raw input vector. This allows the network to disambiguate hard-to-separate classes as the distribution of class scores carries considerable information as well, and is in fact often used for assessing the confidence of a decision. We show that by this approach we are able to significantly boost our results, overall as well as for particular difficult cases, on the hard 10-class gesture classification task.

Manuscript from author [PDF]

ES2015-128

Dynamic gesture recognition using Echo State Networks

Doreen Jirak, Pablo Barros, Stefan Wermter

Abstract
In the last decade, training recurrent neural networks (RNN) using techniques from the area of reservoir computing (RC) became popular for learning sequential data due to the ease of network training. Although successfully applied in the language- and speech research, only little is known about using RC techniques for dynamic gesture recognition. We therefore conduct experiments on command gestures using Echo State Networks (ESN) to investigate both the effect of different gesture sequence representations and different parameter configurations. For recognition we employ the ensemble technique, i.e. using ESN's as weak classifiers. Our results show that using ESN is a promising approach, thus we give indications for future experiments in this research area.

Manuscript from author [PDF]

ES2015-127

A flat neural network architecture to represent movement primitives with integrated sequencing

Andre Lemme, Jochen Steil

Abstract
The paper proposes a minimalistic network to learn a set of movement primitives and their sequencing in one single feedforward network. Utilizing an extreme learning machine with output feedback and a simple inhibition mechanism, this approach can sequence movement primitives efficiently with very moderate network size. It can interpolate movement primitives to create new motions. This work thus demonstrates that an unspecific single hidden layer, that is a flat representation is sufficient to efficiently compose complex sequences, a task which usually requires hierarchiy, multiple timescales and multi-level control mechanisms.

Manuscript from author [PDF]

[Back to Top]


Unsupervised nonlinear dimensionality reduction


ES2015-16

Unsupervised dimensionality reduction: the challenge of big data visualization

Kerstin Bunte, John Aldo Lee

Abstract

Manuscript from author [PDF]

ES2015-37

Autoencoding time series for visualisation

Nikolaos Gianniotis, Sven Dennis Kügler, Peter Tino, Kai Polsterer, Ranjeev Misra

Abstract
We present an algorithm for the visualisation of time series. To that end we employ echo state networks to convert time series into a suitable vector representation which is capable of capturing the latent dynamics of the time series. Subsequently, the obtained vector representations are put through an autoencoder and the visualisation is constructed using the activations of the “bottleneck”. The crux of the work lies with defining an objective function that quantifies the reconstruction error of these representations in a principled manner. We demonstrate the method on synthetic and real data.

Manuscript from author [PDF]

ES2015-97

Diffusion Maps parameters selection based on neighbourhood preservation

Carlos M. Alaíz, Ángela Fernández, José R. Dorronsoro

Abstract
Diffusion Maps is one of the leading methods for dimensionality reduction, although it requires to fix a certain number of parameters that can be crucial for its performance. This parameter selection is usually based on the expertise of the user, as there are no unified criterion for evaluating the quality of the embedding. We propose to use a neighbourhood preservation measure as the criterion for fixing these parameters. As we shall see, this approach provides good embedding parameters without needing problem specific knowledge.

Manuscript from author [PDF]

ES2015-134

Unsupervised Dimensionality Reduction for Transfer Learning

Patrick Blöbaum, Alexander Schulz, Barbara Hammer

Abstract
We investigate the suitability of unsupervised dimensionality reduction (DR) for transfer learning in the context of different representations of the source and target domain. Essentially, unsupervised DR establishes a link of source and target domain by representing the data in a common latent space. We consider two settings: a linear DR of source and target data which establishes correspondences of the data and an according transfer, and its combination with a nonlinear DR which allows to adapt to more complex data characterised by a global nonlinear structure.

Manuscript from author [PDF]

ES2015-74

Efficient unsupervised clustering for spatial birds population analysis along the river Loire

Aurore Payen, Ludovic Journaux, Clément Delion, Lucile Sautot, Bruno Faivre

Abstract
This paper focuses on application and comparison of Non Linear Dimensionality Reduction (NLDR) methods on natural high dimensional bird communities dataset along the Loire River (France). In this context, biologists usually use the well-known linear PCA on their data in order to explain the longitudinal distribution pattern and find discontinuities along the upstream-downstream gradient. Unfortunately this method was unsuccessful on this kind of nonlinear dataset. The goal of this paper is to compare recent NLDR methods coupled with different data transformations in order to find out the best approach on this nonlinear real-life dataset. Results show that Multiscale Jensen-Shannon Embedding (Ms JSE) is the more successful method on this dataset.

Manuscript from author [PDF]

ES2015-75

NLDR methods for high dimensional NIRS dataset : application to vineyard soils characterization

Clément Delion, Ludovic Journaux, Aurore Payen, Lucile Sautot, Emmanuel Chevigny, Pierre Curmi

Abstract
In the context of vineyard soils characterizationn this paper explores and compare dierent recent Non Linear Dimensionality Reduction (NLDR) methods on a high-dimensional Near InfraRed Spectroscopy (NIRS) dataset. NLDR methods are based on k-neighborhood criterion and Euclidean and fractional distances metrics are tested. esults show that Multiscale Jensen-Shannon Embedding (Ms JSE) coupled with euclidean distance outperform all over methods. Application on data is made at global scale and at dierent scale of depth of soil.

Manuscript from author [PDF]

ES2015-137

Geometrical homotopy for data visualization

Diego Hernán Peluffo-Ordóñez, Juan Carlos Alvarado-Pérez, John Aldo Lee, Michel Verleysen

Abstract
This work presents an approach allowing for an interactive visualization of dimensionality reduction outcomes, which is based on an extended view of conventional homotopy. The pairwise functional followed from a simple homotopic function can be incorporated within a geometrical framework in order to yield a bi-parametric approach able to combine several kernel matrices. Therefore, the users can establish the mixture of kernels in an intuitive fashion by only varying two parameters. Our approach is tested by using kernel alternatives for conventional methods of spectral dimensional reduction such as multidimensional scalling, locally linear embedding and laplacian eigenmaps. Provided mixture represents every single dimensional reduction approach as well as helps users to find a suitable representation of embedded data.

Manuscript from author [PDF]

[Back to Top]


Unsupervised learning


ES2015-28

On the equivalence between regularized NMF and similarity-augmented graph partitioning

Anthony Coutant, Hoel Le Capitaine, Philippe Leray

Abstract
Many papers pointed out the interest of (co-)clustering both data and features in a dataset to obtain better performances than methods focused on data only. In addition, recent work have shown that data and features lie in low dimensional manifolds embedded into the original space and this information has been introduced as regularization terms in clustering objectives. Very popular and recent examples are regularized NMF algorithms. However, these techniques have difficulties to avoid local optima and require high computation times, making them inadequate for large scale data. In this paper, we show that NMF with manifolds regularization on a binary matrix is mathematically equivalent to an edgecut partitioning in a graph augmented with manifolds information in the case of hard co-clustering. Based on these results, we explore experimentally the efficiency of regularized graph partitioning methods for hard coclustering on more relaxed datasets and show that regularized multi-level graph partitioning is much faster and often find better clustering results than regularized NMF, and other well-known algorithms.

Manuscript from author [PDF]

ES2015-64

Ranking Overlap and Outlier Points in Data using Soft Kernel Spectral Clustering

Raghvendra Mall, Rocco Langone, Johan Suykens

Abstract
Soft clustering algorithms can handle real-life datasets better as they capture the presence of inherent overlapping clusters. A soft kernel spectral clustering (SKSC) method proposed in [1] exploited the eigen-projections of the points to assign them different cluster membership probabilities. In this paper, we detect points in dense overlapping regions as overlap points. We also identify the outlier points by exploiting the eigen-projections. We then propose novel ranking techniques using structure and similarity properties in the eigen-space to rank these overlap and outlier points. By ranking the overlap and outlier points we provide an order for the most and least influential points in the dataset. We demonstrate the effectiveness of our ranking measures on several datasets.

Manuscript from author [PDF]

ES2015-108

Towards a Tomographic Index of Systemic Risk Measures

Kaj-Mikael Bjork, Patrick Kouontchou, Amaury Lendasse, Yoan Miché, Betrand Maillet

Abstract
Due to the recent financial crisis, several systemic risk measures have been proposed in the literature for quantifying financial system wide distress. In this note we propose an aggregated Index for financial systemic risk measurement based on EOF and ICA analyses on the several systemic risk measures released in the recent literature. We use this index to further identify the states of the market as suggested in Kouontchou et al. [2013]. We show, by characterizing markets conditions with a robust Kohonen Self-Organizing Maps algorithm that this measure is directly linked to crises markets states and there is a strong link between return and systemic risk.

Manuscript from author [PDF]

ES2015-58

An objective function for self-limiting neural plasticity rules.

Rodrigo Echeveste, Claudius Gros

Abstract
Self-organization provides a framework for the study of systems in which complex patterns emerge from simple rules, without the guidance of external agents or fine tuning of parameters. Within this framework, one can formulate a guiding principle for plasticity in the context of unsupervised learning, in terms of an objective function. In this work we derive Hebbian, self-limiting synaptic plasticity rules from such an objective function and then apply the rules to the non-linear bars problem.

Manuscript from author [PDF]

[Back to Top]


Kernel methods


ES2015-7

Probabilistic Classification Vector Machine at large scale

Frank-Michael Schleif, Andrej Gisbrecht, Peter Tino

Abstract
Probabilistic kernel classifiers are effective approaches to solve classification problems but only few of them can be applied to indefinite kernels as typically observed in life science problems and are often limited to rather small scale problems. We provide a novel batch formulation of the Probabilistic Classification Vector Machine for large scale metric and non-metric data.

Manuscript from author [PDF]

ES2015-111

Online Learning with Operator-valued Kernels

Julien Audiffren, Hachem Kadri

Abstract
We consider the problem of learning a vector-valued function f in an online learning setting. The function f is assumed to lie in a reproducing Hilbert space of operator-valued kernels. We describe an online algorithm for learning f while taking into account the output structure. This algorithm, OLOK, extends the standard kernel-based online learning algorithm NORMA from scalar-valued to operator-valued setting. We report a cumulative error bound that holds both for classification and regression. Our experiments show that the proposed algorithm achieves good performance results with low computational cost.

Manuscript from author [PDF]

ES2015-45

Online One-class Classification for Intrusion Detection Based on the Mahalanobis Distance

Patric Nader, Paul Honeine, Pierre Beauseroy

Abstract
Machine learning techniques have been very popular in the past decade for their ability to detect hidden patterns in large volumes of data. Researchers have been developing online intrusion detection algorithms based on these techniques. In this paper, we propose an online one-class classification approach based on the Mahalanobis distance which takes into account the covariance in each feature direction and the different scaling of the coordinate axes. We define the one-class problem by two concentric hyperspheres enclosing the support vectors of the description. We update the classifier at each time step. The tests are conducted on real data.

Manuscript from author [PDF]

ES2015-34

I/S-Race: An iterative Multi-Objective Racing Algorithm for the SVM Parameter Selection Problem

Miranda Péricles, Ricardo Silva, Ricardo Prudêncio

Abstract
Finding appropriate values for the parameters of an algorithm is an important and time consuming task. Recent studies have shown that racing algorithms can effectively handle this task. This paper presents a multi-objective racing algorithm called iterative S-Race (I/S-Race), which efficiently addresses multi-objective model selection problems in the sense of Pareto optimality. We evaluate the I/S-Race for selecting parameters of SVMs, considering 20 widely-used classification datasets. The results revealed that the I/S-Race is an efficient and effective algorithm for automatic model selection, when compared to a brute-force multi-objective selection approach and the S-Race algorithm.

Manuscript from author [PDF]

ES2015-110

SMO Lattices for the Parallel Training of Support Vector Machines

Markus Kächele, Günther Palm, Friedhelm Schwenker

Abstract
In this work, a method is proposed to train Support Vector Machines in parallel. The difference to other parallel implementations is that the problem is decomposed into hierarchically connected nodes and that each node does not have to fully optimize its local problem. Instead Lagrange multipliers are filtered and transferred between nodes during runtime, with important ones ascending and unimportant ones descending inside the architecture. Experimental validation demonstrates the advantages in terms of speed in comparison to other approaches.

Manuscript from author [PDF]

ES2015-59

Pareto front of bi-objective kernel-based nonnegative matrix factorization

Fei Zhu, Paul Honeine

Abstract
The nonnegative matrix factorization (NMF) is a powerful data analysis and dimensionality reduction technique. So far, the NMF has been limited to a single-objective problem in either its linear or nonlinear kernel-based formulation. This paper presents a novel bi-objective NMF model based on kernel machines, where the decomposition is performed simultaneously in both input and feature spaces. The problem is solved employing the sum-weighted approach. Without loss of generality, we study the case of the Gaussian kernel, where the multiplicative update rules are derived and the Pareto front is approximated. The performance of the proposed method is demonstrated for unmixing hyperspectral images.

Manuscript from author [PDF]

ES2015-69

Learning missing edges via kernels in partially-known graphs

Senka Krivic, Sandor Szedmak, Hanchen Xiong, Justus Piater

Abstract
This paper deals with the problem of learning unknown edges with attributes in a partially given multigraph. The method is an extension of the Maximum Margin Multi-Valued Regression (M³VR) to the case where those edges are characterized by different attributes. It is applied on a large scale problem where an agent tries to learn unknown object-object relations by exploiting known such relations. The method can handle not only binary relations but also complex structured relations such as text, images, collections of labels, categories, etc., which can be represented by kernels. We compared the performance with specialized state-of-art matrix completion method.

Manuscript from author [PDF]

[Back to Top]