ESANN2014

22nd European Symposium on Artificial Neural Networks
Bruges, Belgium, April 23-24-25

[Electronic proceedings home page] [Electronic proceedings author index]

ESANN2014
Content of the proceedings

WARNING: you need Adobe Acrobat reader 7.0 or more to view the PDF files below



Advances in Spiking Neural Information Processing Systems (SNIPS)


ES2014-13

Spiking Neural Networks: Principles and Challenges

André Grüning, Sander Bohte

Abstract

Manuscript from author [PDF]

ES2014-50

A new biologically plausible supervised learning method for spiking neurons

Aboozar Taherkhani, Ammar Belatreche, Yuhua Li, Liam Maguire

Abstract
STDP is believed to play an important role in learning and memory. Additionally, experimental evidence shows that a few strong neural inputs can drive a neuron response and subsequently affect learning of other inputs. Furthermore, recent studies have shown that local dendritic depolarization can impact STDP induction. This paper integrates these three biological concepts to devise a new biologically plausible supervised learning method for spiking neurons. Experimental results show that the proposed method can effectively map a random spatiotemporal input pattern to a random target output spike train with a much faster learning speed than ReSuMe.

Manuscript from author [PDF]

ES2014-112

Spiking AGREL

Davide Zambrano, Jaldert O. Rombouts, Cecilia Laschi, Sander Bohte

Abstract
Spiking neural networks are characterised by the spiking neuron models they use and how these spiking neurons process information communicated through spikes - the neural code. We demonstrate a plausible spiking neural network based on Spike Response Models and predictive spike-coding. When combined with a plausible reinforcement learning strategy - Attention Gated REinforcement Learning (AGREL), we show that for the first time, such plausible spiking neural networks can compute non-linear mapping, including XOR.

Manuscript from author [PDF]

ES2014-172

Classifying Patterns in a Spiking Neural Network

Brian Gardner, André Grüning

Abstract
Learning rules for spiking neural networks have emerged that can classify spatio-temporal spiking patterns as precise target spike trains, although there remains uncertainty in which rule to select that offers the greatest performance. Here, we quantify the performance of a stochastic neuron model in learning to classify input patterns by precise target responses as outputs, and compare its performance against other learning rules. We achieve a level of performance that is comparable with that found previously for alternative neuron models, and demonstrate the advantages of classifying inputs by multiple-spike timings: both by increasing the performance and the reliability of classifications.

Manuscript from author [PDF]

ES2014-186

Toward STDP-based population action in large networks of spiking neurons

Emmanuel Daucé

Abstract
We present simulation results that clarify the role of Spike-Timing Dependent Plasticity (STDP) in brain processing as a putative mechanism to transfer spatio-temporal regularities, as observed in sensory signals, toward action, expressed as a global increase of the target population activity, followed by a reset. The repetition of this activation-reset mechanism gives rise to a series of synchronous waves of activity when the same stimulus is repeated over and over. Our simulation results are obtained in recurrent networks of conductance-based neurons under realistic coupling contraints.

Manuscript from author [PDF]

[Back to Top]


Vector quantization- and nearest neighbour-based methods


ES2014-91

Supervised Generative Models for Learning Dissimilarity Data

David Nebel, Barbara Hammer, Thomas Villmann

Abstract
Exemplar based techniques such as affinity propagation \cite{ap} represent data in terms of typical exemplars. This has two benefits: (i) the resulting models are directly interpretable by humans since representative exemplars can be inspected in the same way as data points, (ii) the model can be applied to any dissimilarity measure including non-Euclidean or non-metric settings. Most exemplar based techniques have been proposed in the unsupervised setting only, such that their performance in supervised learning tasks can be weak depending on the given data. Here, we address the problem of learning exemplar-based models for general dissimilarity data in a discriminative framework. For this purpose, we extend a generative model proposed in \cite{srlvq} to an exemplar based scenario using a generalized EM framework for its optimization. The resulting classifiers represent data in terms of sparse models while keeping high performance in state-of-the art benchmarks.

Manuscript from author [PDF]

ES2014-131

Rejection strategies for learning vector quantization

Lydia Fischer, Barbara Hammer, Heiko Wersing

Abstract
We present prototype-based classification schemes, e. g. learning vector quantization, with cost-function-based and geometrically motivated reject options. We evaluate the reject schemes in experiments on artificial and benchmark data sets. We demonstrate that reject options improve the accuracy of the models in most cases, and that the performance of the proposed schemes is comparable to the optimal reject option of the Bayes classifier in cases where the latter is available.

Manuscript from author [PDF]

ES2014-93

Optimization of General Statistical Accuracy Measures for Classification Based on Learning Vector Quantization

Marika Kaden, Wieland Hermann, Thomas Villmann

Abstract
We propose a framework for classification learning based on generalized learning vector quantization using statistical quality measures as cost function. Statistical measures like the $F$-measure or the Matthews correlation coefficient reflect better the performance for two-class classification problems than the simple accuracy, in particular if the data classes are imbalanced. For this purpose, we introduce soft approximations of those quantities contained in the confusion matrix, which are the basis for the calculation of the quality measures.

Manuscript from author [PDF]

ES2014-96

Augmented hashing for semi-supervised scenarios

Zalán-Péter Bodó, Lehel Csato

Abstract
Hashing methods for fast approximate nearest-neighbor search are getting more and more attention with the excessive growth of the available data today. Embedding the points into the Hamming space is an important question of the hashing process. Analogously to machine learning there exist unsupervised, supervised and semi-supervised hashing methods. In this paper we propose a generic procedure to extend unsupervised codeword generators using error correcting codes and semi-supervised classifiers. To show the effectiveness of the method we combine linear spectral hashing and two semi-supervised algorithms in the experiments.

Manuscript from author [PDF]

ES2014-57

Improving accuracy by reducing the importance of hubs in nearest-neighbor recommendations

François Fouss, Clémentine Van Parijs

Abstract
A traditional approach for recommending items to persons consists of including a step of forming neighborhoods of users/items. This work focuses on such nearest-neighbor approaches and, more specically, on a particular type of neighbors, the ones frequently appearing in the neighborhoods of users/items (i.e., very similar to many other users/items in the data set), referred to as hubs in the literature. The aim of this paper is to explore through experiments how the presence of hubs aects the accuracy of nearest-neighbor recommendations.

Manuscript from author [PDF]

ES2014-174

A new approach for multiple instance learning based on a homogeneity bag operator

Alexandre Wagner Chagas Faria, David Menotti, André P. Lemos, Antônio P. Braga

Abstract
Multiple Instance Learning (MIL) proposes a new paradigm when instance labeling, in the learning step, is not possible or infeasible, by assigning a single label (positive or negative) to a set of instances called bag. In this paper, an operator based on homogeneity of positive bags for MIL is introduced. Our method consists in removing instances from the positives bags according to their similarity with the ones from the negative bags. The experimental results show that our operator always increases the accuracy of the Citation kNN algorithm achieving the best results in 2 out of 4 datasets when compared with other classic methods in the literature.

Manuscript from author [PDF]

[Back to Top]


Byte the bullet: learning on real-world computing architectures


ES2014-12

Byte The Bullet: Learning on Real-World Computing Architectures

Alessandro Ghio, Luca Oneto

Abstract
Fast, effective, and reliable models: these are the desiderata of every theorist and practitioner. Machine Learning (ML) algorithms, proposed in the last decades, proved to be effective and reliable in solving complex real-world problems, but they are usually designed without taking into account the underlying computing architecture. On the contrary, the effort of contemplating the exploited computing device is often motivated by application-specific and real-world requirements, such as the need to accelerate the learning process with dedicated/distributed hardware, or to foster energy-sparing requirements of applications based on mobile standalone devices. The ESANN 2014 "Byte The Bullet: Learning on Real-World Computing Architectures" special session has pooled a compilation of the most recent proposals in this area, by encouraging submissions related to the development and the application of fast, effective, reliable techniques, which consider possibilities, potentialities and constraints of real-world computing architectures as basic cornerstones and motivations.

Manuscript from author [PDF]

ES2014-27

Learning with few bits on small-scale devices: From regularization to energy efficiency

Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella

Abstract
The implementation of Machine Learning (ML) algorithms on stand-alone small-scale devices allows incorporating in these systems new services and advanced functionalities without the need of resorting to remote computing systems. Despite having undeniable advantages with respect to conventional general-purpose devices, e.g. in terms of cost/performance ratios, small-scale systems suffer of issues related to their resource-limited nature, like limited battery capacity and processing power. In order to deal with such limitations, we propose to merge local Rademacher Complexities and bit-based hypothesis spaces to build thrifty models, which can be effectively implemented on small-scale resource-limited devices. Experiments, carried out on a smartphone in a Human Activity Recognition application, show the benefits of the proposed approach in terms of model accuracy and battery duration.

Manuscript from author [PDF]

ES2014-171

Speedy greedy feature selection: Better redshift estimation via massive parallelism

Fabian Gieseke, Kai Polsterer, Cosmin Eugen Oancea, Christian Igel

Abstract
Nearest neighbor models are among the most basic tools in machine learning, and recent work has demonstrated their effectiveness in the field of astronomy. The performance of these models crucially depends on the underlying metric, and in particular on the selection of a meaningful subset of informative features. The feature selection is task-dependent and usually very time-consuming. In this work, we propose an efficient parallel implementation of incremental feature selection for nearest neighbor models utilizing nowadays graphics processing units. Our framework provides significant computational speed-ups over its sequential single-core competitor of up to two orders of magnitude. We demonstrate the applicability of the overall scheme on one of the most challenging tasks in astronomy: the detection of distant galaxies.

Manuscript from author [PDF]

ES2014-179

Context- and cost-aware feature selection in ultra-low-power sensor interfaces

Steven Lauwereins, Komail Badami, Wannes Meert, Marian Verhelst

Abstract
This paper introduces the use of machine learning to drastically improve efficiency of ultra-low-power sensor interfaces. Adaptive feature extraction circuits are assisted by hardware embedded learning to dynamically activate only most relevant features. This selection is done in a context and power cost-aware way, through modification of the C4.5 algorithm. Furthermore, context dependence of different feature sets is explained. As proof-of-principle, a Voice Activity Detector is expanded with the proposed context- and cost-dependent voice/noise classifier, resulting in an average circuit power savings of 75%, with negligible accuracy loss.

Manuscript from author [PDF]

ES2014-140

Lightning fast asynchronous distributed k-means clustering

Árpád Berta, István Hegedűs, Róbert Ormándi

Abstract
One of the most fundamental data processing approach is the clustering. This is even true when the computational environment is radically different from the usual centralized architectures. Here we focus on the problem of designing efficient and fast K-Means approaches which work in fully distributed, asynchronous networks without any central control. We assume that the network has a huge number of computational units (even orders of magnitude more than the number of computational units in a general cloud). Additionally, we allow that network failures occur and the computational units leave and join the network through the computation. Our approaches apply online learning clustering models which take different random walks in the network, while they update themselves using the data points stored by the computational units, and various ensemble techniques combine them to get a faster convergence. We define different instantiations of the general framework that applies various ensemble techniques. We evaluate them empirically against several state-of-the-art distributed baseline algorithms in different computational scenarios. The results of our experiments indicate that our methods robust against network failure, while they produce accurate clustering and converge extremely fast.

Manuscript from author [PDF]

[Back to Top]


Reinforcement learning and optimization


ES2014-128

Selective Neural Network Ensembles in Reinforcement Learning

Stefan Faußer, Friedhelm Schwenker

Abstract
Ensemble models can achieve more accurate predictions than single learners. Selective ensembles further improve the predictions by selecting an informative subset of the full ensemble. We consider reinforcement learning ensembles, where the members are neural networks. In this context we study a new algorithm for ensemble subset selection in reinforcement learning scenarios. The aim of the proposed learning strategy is to minimize the Bellman errors of the collected states. In the empirical evaluation, two benchmark applications with large state spaces have been considered, namely SZ-Tetris and generalized maze. Here, our selective ensemble algorithm significantly outperforms other approaches.

Manuscript from author [PDF]

ES2014-130

Learning resets of neural working memory

Jaldert O. Rombouts, Pieter Roelfsema, Sander Bohte

Abstract
Working memory is a key component of intelligence that the brain implements as persistent neural activations. How do persistent neurons learn to store information, and how can they be made to forget this information once it is no longer relevant? When animals learn episodic tasks, neurons in prefrontal cortex learn to represent task ends. We show that a biologically plausible neural network model equipped with persistent memory and a `reset' action can learn to store and forget information at task ends by reinforcement learning. The new model has competitive performance compared to a variety of (biologically implausible) models.

Manuscript from author [PDF]

ES2014-136

Naive Augmenting Q-Learning to Process Feature-Based Representations of States

Janis Zuters

Abstract
Temporal difference algorithms perform well on discrete and small problems. This paper proposes a modification of the Q-learning algorithm towards natural ability to receive a feature list instead of an already identified state in the input. Complete observability is still assumed. The algorithm, Naive Augmenting Q-Learning, has been designed through building a hierarchical structure of input features (a kind of feature-state mapping) to avoid a direct state identification, thus potentially optimizing the required resources for storing and processing action values.

Manuscript from author [PDF]

ES2014-132

Direct Model-Predictive Control

Jean-Joseph Christophe, Jérémie Decock, Olivier Teytaud

Abstract
Due to simplicity, efficiency and convenience, Model Predictive Control, which consists in optimizing future decisions based on a pessimistic deterministic forecast of the random processes, is one of the main tools for stochastic control. Yet, it suffers from a large computation time, unless the tactical horizon (i.e. the number of future time steps included in the optimization) is strongly reduced, and lack of real stochas ticity handling. We here propose a combination between Model Predictive Control and Direct Policy Search (using Neural Networks), combining the best of both worlds.

Manuscript from author [PDF]

ES2014-127

Ensembles of extreme learning machine networks for value prediction

Pablo Escandell-Montero, José María Martínez Martínez, Emilio Soria-Olivas, Joan Vila-Francés, José David Martín-Guerrero

Abstract
Value prediction is an important subproblem of several reinforcement learning (RL) algorithms. In previous work, it has been shown that the combination of least-squares temporal-difference learning with ELM (extreme learning machine) networks is a powerful method for value prediction in continuous-state problems. This work work proposes the use of ensembles to improve the approximation capabilities of ELM networks in the context of RL.

Manuscript from author [PDF]

ES2014-108

An application of the temporal difference algorithm to the truck backer-upper problem

Christopher Gatti, Mark Embrechts

Abstract
We use a reinforcement learning approach to learn a real world control problem, the truck backer-upper problem. In this problem, a tractor trailer truck must be backed into a loading dock from an arbitrary location and orientation. Our approach uses the temporal difference al- gorithm using a neural network as the value function approximator. The novelty of this work is that our implementation is simple, yet it is able to successfully back the truck into the loading dock from random initial locations and orientations.

Manuscript from author [PDF]

ES2014-175

Application of Newton's Method to action selection in continuous state- and action-space reinforcement learning

Barry Nichols, Dimitris Dracopoulos

Abstract
An algorithm based on Newton's Method is proposed for action selection in continuous state- and action-space reinforcement learning without a policy network or discretization. The proposed method is validated on two benchmark problems: Cart-Pole and double Cart-Pole on which the proposed method achieves comparable or improved performance with less parameters to tune and in less training episodes than CACLA, which has previously been shown to outperform many other continuous state- and action-space reinforcement learning algorithms.

Manuscript from author [PDF]

ES2014-39

Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem

Saba Yahyaaa, Madalina Drugan, Bernard Manderick

Abstract
The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the two variant, LS-KG across arms finds the optimal arm set, while LS-KG across dimensions plays fairly the optimal arms.

Manuscript from author [PDF]

ES2014-114

Improving the firefly algorithm through the Barnes-Hut tree code

Kiril Ralinovski

Abstract
The firefly algorithm is a nature-inspired meta-heuristic algorithm that has a variety of applications such as multimodal optimization, clustering and finding good solutions for NP-hard problems. The original algorithm and modifications thereof have so far always calculated interactions between all fireflies individually which leads to a complexity of O(n^2). In this paper we present a novel approach to reduce the complexity to O(n log(n)) in lower dimensions by using the Barnes-Hut tree code, which is used for n-body simulations in physics. This is possible due to the similar nature of both problems and requires only small modifications.

Manuscript from author [PDF]

ES2014-106

Improved Cat Swarm Optimization approach applied to reliability-redundancy problem

Carlos E. Klein, Leandro Coelho, Ângelo M. O. Sant’Anna, Roberto Z. Freire, Viviana C. Mariani

Abstract
System reliability-redundancy optimization plays a vital role in real-world applications. Recently, a new meta-heuristic based on swarm intelligence called cat swarm optimization (CSO) algorithm has emerged. CSO is a stochastic optimization paradigm inspired from the natural behavior of cats. To enhance the performance of the CSO algorithm, an improved adaptive CSO (ICSO) algorithm is presented. Both CSO and ICSO approaches were applied to an overspeed protection system for a gas turbine, a benchmark in the reliability-redundancy mixed-integer optimization field. Better results obtained by the ICSO show that the algorithm can be an efficient alternative for solving reliability problems.

Manuscript from author [PDF]

[Back to Top]


Nonlinear dimensionality reduction


ES2014-165

Relevance Learning for Dimensionality Reduction

Alexander Schulz, Andrej Gisbrecht, Barbara Hammer

Abstract
Nonlinear dimensionality reduction (NDR) techniques offer powerful data visualization schemes capturing nonlinear effects of the data at the costs of a decreased interpretability of the projection: Unlike for linear counterparts such as principal component analysis, the relevance of the original feature dimensions for the NDR projection is not clear. In this contribution we propose relevance learning schemes for NDR which enable to judge the relevance of a feature dimension for the projection. This technique can be extended to a metric learning scheme which opens a way to imprint the information as provided by a given visualization on the data representation in the original feature space.

Manuscript from author [PDF]

ES2014-76

Dimensionality reduction in decentralized networks by Gossip aggregation of principal components analyzers

Jérôme FELLUS, David Picard, Philippe-Henri Gosselin

Abstract
This paper considers dimensionality reduction in large decentralized networks with limited node-local computing and memory resources and unreliable point-to-point connectivity (e.g, peer-to-peer, sensors or ad-hoc mobile networks). We propose an asynchronous decentralized algorithm built on a Gossip consensus protocol that perform Principal Components Analysis (PCA) of data spread over such networks. All nodes obtain the same local basis that span the global principal subspace. Reported experiments show that obtained bases both reach a consensus and accurately estimate the global PCA solution.

Manuscript from author [PDF]

ES2014-64

Multiscale stochastic neighbor embedding: Towards parameter-free dimensionality reduction

John A. Lee, Diego H. Peluffo-Ordonez, Michel Verleysen

Abstract
Stochastic neighbor embedding (SNE) is a method of dimensionality reduction that involves softmax similarities measured between all pairs of data points. To build a suitable embedding, SNE tries to reproduce in a low-dimensional space the similarities that are observed in the high-dimensional data space. Previous work has investigated the immunity of such similarities to norm concentration, as well as enhanced cost functions. This paper proposes an additional refinement, in the form of multiscale similarities, namely weighted sums of softmax ratios with decreasing bandwidths. The objective is to maximize the embedding quality at all scales, with a better preservation of both local and global neighborhoods, and also to exempt the user from having to fix a scale arbitrarily. Experiments on several data sets show that this multiscale version of SNE, combined with an appropriate cost function (sum of Jensen-Shannon divergences), outperforms all previous variants of SNE.

Manuscript from author [PDF]

ES2014-75

Interactive dimensionality reduction for visual analytics

Ignacio Diaz-Blanco, Abel A. Cuadrado-Vega, Daniel Perez-Lopez, Francisco Garcia-Fernandez, Michel Verleysen

Abstract
In this work, we present a novel approach for data visualization based on interactive dimensionality reduction (iDR). The main idea of the paper relies on considering for visualization the intermediate results of non-convex DR algorithms under changes on the metric of the input data space driven by the user. With an appropriate visualization interface, our approach allows the user to focus on the relationships among dynamically selected groups of variables, as well as to assess the impact of a single variable or groups of variables in the structure of the data.

Manuscript from author [PDF]

ES2014-170

Recent methods for dimensionality reduction: A brief comparative analysis

Diego H. Peluffo-Ordonez, John A. Lee, Michel Verleysen

Abstract
Dimensionality reduction is a key stage for both the design of a pattern recognition system or data visualization. Recently, there has been a increasing interest on those methods aimed at preserving the data topology. Among them, Laplacian eigenmaps (LE) and stochastic neighbour embedding (SNE) are the most representative. In this work, we present a brief comparative among very recent methods being alternatives to LE and SNE. Comparisons are done mainly on two aspects: algorithm implementation, and complexity. Also, relations between methods are depicted. The goal of this work is providing with some discussion and criterion decision to chose a method according to the user's needs and/or keeping a good trade-off between performance and required processing time.

Manuscript from author [PDF]

[Back to Top]


Signal and temporal processing


ES2014-123

Capturing confounding sources of variation in DNA methylation data by spatiotemporal independent component analysis

Emilie Renard, Andrew Teschendorff, Pierre-Antoine Absil

Abstract
Confounding sources of variation, which are often either unknown or known with error, are widespread in genomic datasets, and failing to adjust for them may adversely impact statistical inference. In this context, we propose a `spatiotemporal' independent component analysis method that possesses a novel invariance property, and we show that that spatiotemporal aspect may increase the ability of the method to model confounding sources of variation.

Manuscript from author [PDF]

ES2014-126

Towards an effective multi-map self organizing recurrent neuronal network

Denis Baheux, Hervé Frezza-Buet, Jeremy Fix

Abstract
This paper presents a multi-map joint self-organizing architecture able to represent non-markovian temporal sequences. The proposed architecture is inspired by previous works based on dynamic neural fields. It provides a faster and easier to handle architecture making it easier to scale to higher dimensional machine learning problems.

Manuscript from author [PDF]

ES2014-94

Iterative ARIMA-multiple support vector regression models for long term time series prediction

João Oliveira, Teresa B. Ludermir

Abstract
Support Vector Regression (SVR) has been widely applied in time series forecasting. Considering long term predictions, iterative predictions perform many one-step-ahead predictions until the desired horizon is achieved. This process accumulates the error from previous predictions and may affect the quality of forecasts. In order to improve long term iterative predictions a hybrid multiple Autoregressive Integrated Moving Average(ARIMA)-SVR model is applied to perform predictions considering linear and non-linear components from the time series. The results show that the proposed method produces more accurate predictions in the long term context.

Manuscript from author [PDF]

ES2014-69

Region of interest detection using MLP

Tommi Kärkkäinen, Alexandr Maslov, Pekka Wartiainen

Abstract
A novel technique to detect regions of interest in a time series as deviation from the characteristic behavior is proposed. The deterministic form of a signal is obtained using a reliably trained MLP neural network with detailed complexity management and cross-validation based generalization assurance. The proposed technique is demonstrated with simulated and real data.

Manuscript from author [PDF]

ES2014-142

Analysis of the Weighted Fuzzy C-means in the problem of source location

Everton Z. Nadalin, Rodrigo C. Silva, Romis Attux, João M. T. Romano

Abstract
This paper proposes the use of the clustering method called Weighted Fuzzy C-means to solve the problem of mixing matrix estimation in underdetermined source separation based on sparse component analysis. The performed comparative analysis shows that the approach has a significant application potential, especially if the distributions of the columns of the mixing matrix has a non-uniform character.

Manuscript from author [PDF]

ES2014-125

An Optimized Learning Algorithm Based on Linear Filters Suitable for Hardware implemented Self-Organizing Maps

Marta Kolasa, Rafal Dlugosz, Talaska Tomasz, Pedrycz Witold

Abstract
In this study, we present a fast and energy efficient learning algorithm suitable for Self-Organizing Maps (SOMs) realized in hardware. The proposed algorithm is an extension of the classical algorithm used in Kohonen SOM. It is based on the observation that the quantization error that is a typical quality measure of the learning process, does not decrease linearly along the learning process. One can observe the phases of the increased ‘activity’, during which the quantization error rapidly decreases, followed by ‘stagnation’ phases, during which its values are almost the same. The activity phases occur just after decreasing the neighborhood radius, R. A set of finite impulse response (FIR) filters is used to detect both phases. This enables an automatic switching the radius R to a smaller value that shorts a given stagnation phase and starts a new activity phase. Comprehensive investigations carried out by means of the software model of the SOM show that the learning process can be shorten even by 80-95% that allows for reduction of energy consumption even by 70-90%.

Manuscript from author [PDF]

ES2014-115

Enhanced NMF initialization using a physical model for pollution source apportionment

Marc Plouvin, Abdelhakim Limem, Matthieu Puigt, Gilles Delmaire, Gilles Roussel, Dominique Courcot

Abstract
In a previous work, we proposed an informed Non-negative Matrix Factorization (NMF) with a specific parametrization which involves constraints about some known components of the factorization. In this paper we extend the above work by adding some information provided by a physical dispersion model. In particular, we derive a special structure of one of the factorizing matrices, which provides a better initialization of the NMF procedure. Experiments on simulated mixtures of particulate matter sources show that our new approach outperforms both our previous one and the state-of-the-art NMF methods.

Manuscript from author [PDF]

ES2014-14

A New Error-Correcting Syndrome Decoder with Retransmit Signal Implemented with an Hardlimit Neural Network

Jose Fonseca

Abstract
Still today the problem of counting the errors of a noisy received word is an open problem in literature. This means that when we use an error correcting code we cannot control if the number of errors of the received noisy word is greater than the error correction capability of the code of k errors, k=(d-1)/2, where d is the minimum Hamming distance of the code. The main advantage of our proposal results from the introduction of the Retransmit signal when the syndrome decoder detects an ambiguity situation and cannot correct the noisy word. These ambiguity situations occur when happens one more error than the error correction capability of the error correcting code. This property of the error correcting syndrome scheme allows increasing the error correction capability of an error correcting code by one error at a little increment of bandwidth or delay in the transmission. Although there are some proposals of implementation of error-correcting decoders with neural networks in literature our work is completely different in what concerns three main aspects. First we propose the implementation of the retransmit signal based on the detection of ambiguity of the minimum Hamming distance between the received word and each of the codewords, i.e. when there are more than one codeword at the minimum Hamming distance to the too noisy received word. Second we use a constructive approach that does not need training. And finally we use hardlimit neurons that can be implemented in hardware by a single transistor in a high gain setup. We begin with two exhaustive simulation experiments where we introduced all manners of occurrence of two errors in all codewords of two codes with minimum Hamming distances 3 and 4, respectively, which only guarantee all possible one error good corrections, to show how the ambiguities arise in the decoding process. Next we present the building blocks of the error correcting decoder based in hardlimit multilayered perceptrons and then we assembled all them out and show an example for an error correcting decoder for a four codewords error correcting code. Finally we discuss the advantages of our proposal and the consequences of the introduction of the Retransmit signal and define possible ways of evolution of our work.

Manuscript from author [PDF]

[Back to Top]


Learning of structured and non-standard data


ES2014-11

Recent trends in learning of structured and non-standard data

Frank-Michael Schleif, Peter Tino, Thomas Villmann

Abstract
In many application domains data are not given in a classical vector space but occur in form of structural, sequential, relational characteristics or other non-standard formats. These data are often represented as graphs or by means of proximity matrices. Often these data sets are also huge and mathematically complicated to treat requesting for new efficient analysis algorithms which are the focus of this tutorial.

Manuscript from author [PDF]

ES2014-58

Support Vector Ordinal Regression using Privileged Information

Fengzhen Tang, Peter Tino, Pedro Antonio Gutiérrez, Huanhuan Chen

Abstract
We introduce a new methodology, called SVORIM+, for utilizing privileged information of the training examples, unavailable in the test regime, to improve generalization performance in ordinal regression. The privileged information is incorporated during the training by modelling the slacks through correcting functions for each of the parallel hyperplanes separating the ordered classes. The experimental results on several benchmark and time series datasets show that inclusion of the privileged information during training can boost the generalization performance significantly.

Manuscript from author [PDF]

ES2014-70

Segmented shape-symbolic time series representation

Herbert Teun Kruitbosch, Ioannis Giotis, Michael Biehl

Abstract
This work introduces a symbolic time series representation using monotonic sub-sequences and bottom up segmentation. The representation minimizes the square error between the segments and their approximation by monotonic functions. The representation can robustly classify the direction of a segment and is scale invariant with respect to the time and value dimensions. This paper describes two experiments. The first shows how accurately the monotonic functions are able to discriminate between different segments. The second tests how well the segmentation technique recognizes segments and classifies them with correct symbols. Finally this paper illustrates the new representation on real-world data.

Manuscript from author [PDF]

ES2014-82

Adaptive distance measures for sequential data

Bassam Mokbel, Benjamin Paassen, Barbara Hammer

Abstract
Recent extensions of learning vector quantization (LVQ) to general (dis-)similarity data have paved the way towards LVQ classifiers for possibly discrete, structured objects such as sequences addressed by classical alignment. In this contribution, we propose a metric learning scheme based on this framework which allows for autonomous learning of the underlying scoring matrix according to a given discriminative task. Besides facilitating the often crucial and problematic choice of the scoring matrix in applications, this extension offers an increased interpretability of the results by pointing out structural invariances for the given task.

Manuscript from author [PDF]

ES2014-153

Applications of lp-Norms and their Smooth Approximations for Gradient Based Learning Vector Quantization

Mandy Lange, Dietlind Zühlke, Olaf Holz, Thomas Villmann

Abstract
Learning vector quantization applying non-standard metrics became quite popular for classification performance improvement compared to standard approaches using the Euclidean distance. Kernel metrics and quadratic forms belong to the most promising approaches. In this paper we consider Minkowski distances ($l_{p}$-norms). In particular, $l_{1}$-norms are known to be robust against noise in data, such that, if this structural knowledge is available in advance about the data, this norm should be utilized. However, application in gradient based learning algorithms based on distance evaluations need to calculate the respective derivatives. Because $l_{p}$-distance formulas contain the absolute approximations thereof are required. We consider in this paper several approaches for smooth consistent approximations for numerical evaluations and demonstrate the applicability for exemplary real world applications.

Manuscript from author [PDF]

ES2014-155

Utilization of Chemical Structure Information for Analysis of Spectra Composites

Kristin Domaschke, André Roßberg, Thomas Villmann

Abstract
In this paper, we propose the utilization of structural information of spectral data during the preprocessing to extend the ability of subsequent analysis methods. Specifically, a composite dataset of measured spectra is given containing a mixtures of only a few spectral components. Using the mixture information for a small subset and additional chemical knowledge, theoretical component spectra are generated, which can be used as additional data with known mixture information in further data processing. The resulting extended data set is analyzed using a self-organizing map to predict the unknown mixture ratios of the remaining subset by an associative learning.

Manuscript from author [PDF]

ES2014-190

Weighted tree kernels for sequence analysis

Christopher Bowles, James Hogan

Abstract
Genomic sequences are fundamentally text documents, admitting a number of representations according to need and tokenization. Control of gene expression depends crucially on the binding of enzymes to the DNA sequence at a number of small, often poorly conserved, binding sites, precluding successful application of standard pattern search tools. However some progress is possible due to the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, allowing us to view the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels which extend existing tree kernels for application to weighted trees, in order to capture the features which underpin this prediction task. Experimentally, we find that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data, including many other word-based classification tasks in molecular biology and other domains.

Manuscript from author [PDF]

[Back to Top]


Kernel methods


ES2014-129

Easy multiple kernel learning

Fabio Aiolli, Michele Donini

Abstract
The goal of Multiple Kernel Learning (MKL) is to combine kernels derived from multiple sources in a data-driven way with the aim to enhance the accuracy of a kernel based machine. In this paper, we propose a time and space efficient MKL algorithm that can easily cope with hundreds of thousands of kernels and more. We compared our algorithm with other baselines plus three state-of-the-art MKL methods showing that our approach is often superior.

Manuscript from author [PDF]

ES2014-35

Joint SVM for Accurate and Fast Image Tagging

Hanchen Xiong, Sandor Szedmak, Justus Piater

Abstract
This paper studies how joint training of multiple support vector machines (SVMs) can improve the effectiveness and efficiency of au- tomatic image annotation. We cast image annotation as an output-related multi-task learning framework, with the prediction of each tag’s presence as one individual task. Evidently, these tasks are related via correlations between tags. The proposed joint learning framework, which we call joint SVM, can encode the correlation between tags by defining appropriate ker- nel functions on the outputs. Another practical merit of the joint SVM is that it shares the same computational complexity as one single conven- tional SVM, although multiple tasks are solved simultaneously. According to our empirical results on an image-annotation benchmark database, our joint training strategy of SVMs can yield substantial improvements, in terms of both accuracy and efficiency, over training them independently. In particular, it outperforms many other state-of-the-art algorithms.

Manuscript from author [PDF]

ES2014-59

Kernel methods for mixed feature selection

Jérôme Paul, Pierre Dupont

Abstract
This paper introduces two feature selection methods to deal with heterogeneous data that include continuous and categorical variables. We propose to plug a dedicated kernel that handles both kind of variables into a Recursive Feature Elimination procedure using either a non-linear SVM or Multiple Kernel Learning. These methods are shown to offer significantly better predictive results than state-of-the-art alternatives on a variety of high-dimensional classification tasks.

Manuscript from author [PDF]

ES2014-147

The one-sided mean kernel: a positive definite kernel for time series

Nicolas Chrysanthos, Pierre Beauseroy, Hichem Snoussi, Edith Grall-Maes, Fabrice Ferrand

Abstract
We propose in this paper a new kernel for time series in the dynamic time warping family. We demonstrate using the theory of infinitely divisible kernels that this kernel is positive definite. We also illustrate many other interesting practical properties: it is a radial basis kernel, has no issues of diagonal dominance, and presents a consistent behavior in the case of time series sub-sampling.

Manuscript from author [PDF]

ES2014-150

A robust regularization path for the Doubly Regularized Support Vector Machine

Antoine Lachaud, Stéphane Canu, David Mercier, Frederic Suard

Abstract
The Doubly Regularized SVM (DrSVM) is an extension of SVM using a mixture of L2 and L1 norm penalties. This kind of penalty, sometimes referred as the elastic net, allows to perform variable selection while taking into account correlations between variables. Introduced by Wang an efficient algorithm to compute the whole DrSVM solution path has been proposed. Unfortunately, in some cases, this path is discontinuous, and thus not piecewise linear. To solve this problem, we propose here a new sub gradient formulation of the DrSVM problem. This led us to propose an alternative L1 regularization path algorithm. This reformulation efficiently addresses the aforementioned problem and makes the initialization step more generic. The results show the validity of our sub-gradient formulation and the efficiency compared to the initial formulation.

Manuscript from author [PDF]

ES2014-34

Tensor decomposition of dense SIFT descriptors in object recognition

Tan Vo, Dat Tran, Ma Wanli

Abstract
In machine vision, Scale-invariant feature transform (SIFT) and its variants have been widely used in image classification task. However, the high dimensionality nature of SIFT features, usually in the order of multiple thousands per image, would require careful consideration in place to achieve accurate and timely categorization of objects within images. This paper explores the possibility of processing SIFT features as tensors and uses tensor decomposition techniques on high-order SIFT tensors for dimensionality reduction. The method focuses on both accuracy and efficiency aspects and the validation result with the Caltech 101 dataset confirms the improvement with notable margins.

Manuscript from author [PDF]

ES2014-28

Fine-tuning of support vector machine parameters using racing algorithms

Péricles Miranda, Ricardo Silva, Ricardo Prudêncio

Abstract
This paper investigates the iterative racing approach, I/F-Race, for selecting parameters of SVMs. As a racing algorithm, I/F-Race eliminates candidate models as soon as there is sufficient statistical evidence of their inferiority relative to other models with respect to the objective. The results revealed that the I/F-Race algorithm was able to achieve better parameter values in comparison to default parameters used in literature and parameters suggested by particle swarm optimization techniques.

Manuscript from author [PDF]

ES2014-4

reject option paradigm for the reduction of support vectors

Sousa Ricardo, Ajalmar R. da Rocha Neto, Guilherme Barreto, Jaime S. Cardoso, Miguel T. Coimbra

Abstract
In this paper we introduce a new conceptualization for the reduction of the number of support vectors (SVs) for an efficient design of support vector machines (SVMs). The techniques here presented provide a good balance between SVs reduction, performance accuracies and generalization capability. Our proposal explores concepts from classification with reject option mechanisms, which output a third class (the rejected instances) for a binary problem when a prediction cannot be given with sufficient confidence. Rejected instances along with misclassified ones are discarded from the original data to give rise to a classification problem that can be linearly solved. Our experimental study on two benchmark datasets show significant gains in terms of SVs reduction with competitive performances.

Manuscript from author [PDF]

ES2014-156

The Choquet kernel for monotone data

Ali Fallah Tehrani, Marc Strickert, Eyke Hüllermeier

Abstract
In this paper, we introduce a kernel for monotone data derived from the Choquet integral with its underlying fuzzy measure. % for monotone learning. While a na\"ive computation of this kernel has a complexity that is exponential in the number of data attributes, we propose a more efficient approach with quadratic time complexity. Kernel PCA and SVM classification are employed to illustrate characteristics and benefits of the new Choquet kernel in two experiments related to decision-making and pricing.

Manuscript from author [PDF]

[Back to Top]


Learning and Modeling Big Data


ES2014-8

Learning and modeling big data

Barbara Hammer, Haibo He, Thomas Martinetz

Abstract
Caused by powerful sensors, advanced digitalisation techniques, and dramatically increased storage capabilities, big data in the sense of large or streaming data sets, very high dimensionality, or complex data formats constitute one of the major challenges faced by machine learning today. In this realm, a couple of typical assumptions of machine learning can no longer be met, such as e.g. the possibility to deal with all data in batch mode or data being identically distributed; this causes the need for novel algorithmic developments and paradigm shifts, or for the adaptation of existing ones to cope with such situations. The goal of this tutorial is to give an overview about recent machine learning approaches for big data, with a focus on principled algorithmic ideas in the field.

Manuscript from author [PDF]

ES2014-61

Agglomerative hierarchical kernel spectral clustering for large scale networks

Mall Raghvendra, Langone Rocco, Suykens Johan

Abstract
We propose an agglomerative hierarchical kernel spectral clustering (AH-KSC) model for large scale complex networks. The kernel spectral clustering (KSC) method uses a primal-dual framework to build a model on a subgraph of the network. We exploit the structure of the projections in the eigenspace to automatically identify a set of distance thresholds. These thresholds lead to the different levels of hierarchy in the network. We use these distance thresholds on the eigen-projections of the entire network to obtain a hierarchical clustering in an agglomerative fashion. The proposed approach locates several levels of hierarchy which have clusters with high modularity (Q) and high adjusted rand index (ARI) w.r.t. the groundtruth communities. We compare AH-KSC with $2$ state-of-the-art large scale hierarchical community detection techniques.

Manuscript from author [PDF]

ES2014-33

Proximity learning for non-standard big data

Frank-Michael Schleif

Abstract
Complicated data sets, by means of high volume and variety are more and more common. In the life science domain for example, modern measurement systems generate huge amounts of data, which are only partially structured but have to be analyzed in reasonable time to trigger decisions and further analysis steps. Available algorithms to analyze such data do often not scale to larger problems or are inaccessible due to the variety of the data formats. Relational learning methods, as effective computational intelligence algorithms, can be used only for moderately large metric datasets. Here we discuss novel strategies to open relational methods for non-standard data formats at large scale, applied to a very large protein sequence database.

Manuscript from author [PDF]

ES2014-36

Predicting Grain Protein Content of Winter Wheat

Mansouri Majdi, Marie-France Destain , Benjamin Dumont

Abstract
The objective of this paper is to propose to use a new Improved Particle Filtering (IPF) based on minimizing Kullback-Leibler divergence for crop models' predictions. The performances of the method are compared with those of the conventional Particle Filtering (PF) at a complex crop model (AZODYN) to predict an important winter-wheat quality criterion, namely the grain protein content. Furthermore, the effect of measurement noise (e.g., different signal-to-noise ratios) on the performances of PF and IPF is investigated. The results of the comparative studies show that the IPF provides a significant improvement over the PF because, unlike the PF which depends on the choice of sampling distribution used to estimate the posterior distribution, the IPF yields an optimum choice of the sampling distribution, which also accounts for the observed data. The efficiency of IPF is expressed in terms of estimation accuracy (root mean square error).

Manuscript from author [PDF]

[Back to Top]


Classification


ES2014-44

On the complexity of shallow and deep neural network classifiers

Monica Bianchini, Franco Scarselli

Abstract
Recently, deep networks were proved to be more effective than shallow architectures to face complex real--world applications. However, theoretical results supporting this claim are still few and incomplete. In this paper, we propose a new topological measure to study how the depth of feedforward networks impacts on their ability of implementing high complexity functions. Upper and lower bounds on network complexity are established, based on the number of hidden units and on their activation functions, showing that deep architectures are able, with the same number of resources, to address more difficult classification problems.

Manuscript from author [PDF]

ES2014-188

Evidence build-up facilitates on-line adaptivity in dynamic environments: example of the BCI P300-speller

Emmanuel Daucé, Eoin Thomas

Abstract
We consider a P300 BCI application where the subjects have to spell-out figures and letters in an unsupervised fashion. We (i) show that a generic speller can attain the state-of-the-art accuracy without any training phase or calibration and (ii) present an adaptive setup that consistently increases the bit rate for most of the subjects.

Manuscript from author [PDF]

ES2014-16

Choosing the Metric in High-Dimensional Spaces Based on Hub Analysis

Dominik Schnitzer, Arthur Flexer

Abstract
To avoid the undesired effects of distance concentration in high dimensional spaces, previous work has already advocated the use of fractional lp norms instead of the ubiquitous Euclidean norm. Closely related to concentration is the emergence of hub and anti-hub objects. Hub objects have a small distance to an exceptionally large number of data points while anti-hubs lie far from all other data points. The contribution of this work is an empirical examination of concentration and hubness, resulting in an unsupervised approach for choosing an lp norm by minimizing hubs while simultaneously maximizing nearest neighbor classification.

Manuscript from author [PDF]

ES2014-62

Robust outlier detection with L0-SVDD

Meriem El Azami, Carole Lartizien, Stéphane Canu

Abstract
The problem of outlier detection consists in finding data that is not representative of the population from which it was ostensibly derived. Recently, to solve this problem, Liu et al. have proposed a two step hypersphere-based approach, taking into account a confidence score pre-calculated for each input data. However, these scores are defined in a first phase, independently from the second one, making this approach is not well-suited for large stream data. To solve these difficulties, we propose a global reformulation of the support vector data description (SVDD) problem based on the L0 norm, well suited for {outlier detection}. We demonstrate that this L0-SVDD problem can be solved using an iterative procedure providing data specific weighting terms. We show that our approach outperforms state of the art outlier detection techniques using both synthetic and real clinical data.

Manuscript from author [PDF]

ES2014-45

Toward parallel feature selection from vertically partitioned data

Veronica Bolon-Canedo, Noelia Sánchez-Maroño, Joana Cervino-Rabunal

Abstract
Feature selection is often required as a preliminary step for many pattern recognition problems. In recent years, parallel learning has been the focus of much attention due to the advent of high dimensionality. Still, most of the existing algorithms only work in a centralized manner, i.e. using the whole dataset at once. This paper proposes a parallel filter approach for vertically partitioned data. The idea is to split the data by features and then apply a filter at each partition performing several rounds to obtain a stable set of features. Later, a merging procedure is carried out to combine the results into a single subset of relevant features. Experiments on three representative datasets show that the execution time is considerably shortened whereas the performance is maintained or even improved compared to the standard algorithms applied to the non-partitioned datasets. The proposed approach can be used with any filter algorithm, so it could be seen as a general framework for parallel feature selection.

Manuscript from author [PDF]

ES2014-100

Modeling consumption of contents and advertising in online newspapers

Iago Porto-Díaz, David Martínez-Rego, Oscar Fontenla-Romero, Amparo Alonso-Betanzos

Abstract
This paper presents the design of a system for personalization of contents and advertising for readers of online newspapers. This software is conceived to work in a context of high network traffic with millions of URLs served each day. The model is divided into two subsystems. The first one takes care of the recommendation of news items. The mathematical model is based on the PageRank algorithm and considers several practical day-to-day scenarios. The second one, which is the subsystem of personalization of advertising, uses a multinomial logistic regression model in order to predict categories of advertising for banners within the news content. The system has obtained practical satisfactory results using real data.

Manuscript from author [PDF]

ES2014-99

Reweighted l1 Dual Averaging Approach for Sparse Stochastic Learning

Jumutc Vilen, Suykens Johan

Abstract
Recent advances in stochastic optimization and regularized dual averaging approaches revealed a substantial interest for a simple and scalable stochastic method which is tailored to some more specific needs. Among the latest one can find sparse signal recovery and l0-based sparsity inducing approaches. These methods in particular can force many components of the solution shrink to zero thus clarifying the importance of the features and simplifying the evaluation. In this paper we concentrate on enhancing sparsity of the recently proposed l1 Regularized Dual Averaging (RDA) method with a simple reweighting iterative procedure which in a limit applies the l0-norm penalty. We present some theoretical justifications of a bounded regret for a sequence of convex repeated games where every game stands for a separate reweighted l1-RDA problem. Numerical results show an enhanced sparsity of the proposed approach and some improvements over the l1-RDA method in generalization error.

Manuscript from author [PDF]

ES2014-121

Using Shannon Entropy as EEG Signal Feature for Fast Person Identification

Dinh Phung, Dat Tran, Ma Wanli, Phuoc Nguyen, Pham Tien

Abstract
Identification accuracy and speed are important factors in automatic person identification systems. In this paper, we propose a feature extraction method to extract brain wave features from different brain rhythms of electroencephalography (EEG) signal for the purpose of fast, yet accurate person identification. The proposed feature extraction method is based on the fact that EEG signal is complex, non-stationary, and non-linear. With this fact, non-linear analysis like entropy would be more appropriate. Shannon entropy (SE) based EEG features from alpha, beta, and gamma wave bands are extracted and evaluated for person identification. Experimental results show that SE features provide high person identification rates yet with a low feature dimension, thus better performance.

Manuscript from author [PDF]

ES2014-159

Machine learning techniques to assess the performance of a gait analysis system

Sébastien Piérard, Rémy Phan-Ba, Marc Van Droogenbroeck

Abstract
This paper presents a methodology based on machine learning techniques to assess the performance of a system measuring the trajectories of the lower limbs extremities for the follow-up of patients with multiple sclerosis. We show how we have established, with the help of machine learning, four important properties about this system: (1) an automated analysis of gait characteristics provides an improved analysis with respect to that of a human expert, (2) after learning, the gait characteristics provided by this system are valuable compared to measures taken by stopwatches, as used in the standardized tests, (3) the motion of the lower limbs extremities contains a lot of useful information about the gait, even if it is only a small part of the body motion, (4) a measurement system combined with a machine learning tool is sensitive to intra-subject modifications of the walking pattern.

Manuscript from author [PDF]

ES2014-144

A Random Forest proximity matrix as a new measure for gene annotation

Jose A. Seoane, Ian N.M. Day, Juan P. Casas, Colin Campbell, Tom R. Gaunt

Abstract
In this paper we present a new score for gene annotation. This new score is based on the proximity matrix obtained from a trained Random Forest (RF) model. As an example application, we built this model using the association p-values of genotype with blood phenotype as input and the association of genotype data with coronary heart disease as output. This new score has been validated by comparing the Gene Ontology (GO) annotation using this score versus the score obtained from the gene annotation “String” tool. Using the new proximity based measure results in more accurate annotation, especially in the GO categories Molecular Function and Biological Process

Manuscript from author [PDF]

ES2014-149

Neural network based 2D/3D fusion for robotic object recognition

Louis-Charles Caron, Yang Song, Filliat David, Alexander Gepperth

Abstract
We present a neural network based fusion approach for robotic object recognition which integrates 2D and 3D descriptors in a flexible way. The presented recognition architecture is coupled to a real-time segmentation step based on 3D data, since a focus of our investigations is real-world operation. In the light of the fact that recognition must operate on less-than-perfect segmentation results (which is a very realistic assumption for real-time robotic implementation), we conduct tests of recognition performance using complex everyday objects in order to quantify the overall gain of performing 2D/3D fusion, and to discover where it is particularly useful. We find that the fusion approach ist most powerful when generalization is required, for example to significant viewpoint changes, and that a perfect segmentation is apparently not a necessary prerequisite for successful object discrimination.

Manuscript from author [PDF]

ES2014-148

Discrimination of visual pedestrians data by combining projection and prediction learning

Mathieu Lefort, Alexander Gepperth

Abstract
PROPRE is a generic and semi-supervised neural learning paradigm that extracts meaningful concepts of multimodal data flows based on predictability across modalities. It consists on the combination of two computational paradigms. First, a topological projection of each data flow on a self-organizing map (SOM) to reduce input dimension. Second, each SOM activity is used to predict activities in all other SOMs. Predictability measure, that compares predicted and real activities, is used to modulate the SOM learning to favor mutually predictable stimuli. In this article, we study PROPRE applied to a classical visual pedestrian data classification task. The SOM learning modulation introduced in PROPRE improves significantly classification performance.

Manuscript from author [PDF]

ES2014-138

FINGeR: Framework for interactive neural-based gesture recognition

German Ignacio Parisi, Pablo Barros, Stefan Wermter

Abstract
For operating in real world scenarios, the recognition of human gestures must be adaptive, robust and fast. Despite the prominent use of Kinect-like range sensors for demanding visual tasks involving motion, it still remains unclear how to process depth information for efficiently extrapolating the dynamics of hand gestures. We propose a learning framework based on neural evidence for processing visual information. We first segment and extract spatiotemporal hand properties from RGB-D videos. Shape and motion features are then processed by two parallel streams of hierarchical self-organizing maps and subsequently combined for a more robust representation. We provide experimental results to show how multi-cue integration increases recognition rates over a single-cue approach.

Manuscript from author [PDF]

ES2014-168

Beyond histograms: why learned structure-preserving descriptors outperform HOG

Thomas Guthier, Volker Willert, Julian Eggert

Abstract
Statistical image descriptors based on histograms (e.g. SIFT, HOG) are widely used in image processing, because they are fast and simple methods with high classification performance. However, they discard the local spatial topology and thus lose discriminative information contained in the image. We discuss the relations between HOG and VNMF descriptors, i.e. structure free histograms versus learned structure-preserving patterns. VNMF is a shift-invariant, sparse, non-negative unsupervised learning algorithm, that provides a distinct decomposition of the input into its parts. The VNMF descriptor outperforms the statistical HOG descriptor, because it preserves spatial topology leading to better classification results on real-world human action recognition benchmarks.

Manuscript from author [PDF]

ES2014-151

NMF-Density: NMF-Based Breast Density Classifier

Lahouari Ghouti, Abdullah Owaidh

Abstract
The amount of tissue available in the breast, commonly characterized by the breast density, is highly correlated with breast cancer. In fact, dense breasts have higher risk of developing breast cancer. On the other hand, breast density influences the mammographic interpretation since it decreases the sensitivity of breast cancer detection. This sensitivity decrease is due to the fact that both cancerous regions and tissue appear as white areas in breast mammograms. This paper introduces new features to improve the classification of breast density in digital mammograms according to the commonly used radiological lexicon (BI-RADS). These features are extracted from non-negative matrix factorization (NMF) of mammograms and classified using machine learning classifiers. Using ground truth mammographic data, the classification performance of the proposed features is assessed. Simulation results show that the latter significantly outperforms existing density features based on principal component analysis (PCA) by achieving higher classification accuracy.

Manuscript from author [PDF]

[Back to Top]


Dynamical systems and online learning


ES2014-49

Implicitly and explicitly constrained optimization problems for training of recurrent neural networks

Carl-Johan Thore

Abstract
Training of recurrent neural networks is typically formulated as unconstrained optimization problems. There is, however, an implicit constraint stating that the equations of state must be satisfied at every iteration in the optimization process. Such constraints can make a problem highly non-linear and thus difficult to solve. A potential remedy is to reformulate the problem into one in which the parameters and state are treated as independent variables and all constraints appear explicitly. In this paper we compare an implicitly and an explicitly constrained formulation of the same problem. Reported numerical results suggest that the latter is in some respects superior.

Manuscript from author [PDF]

ES2014-166

A HMM-based pre-training approach for sequential data

Luca Pasa, Alberto Testolin, Alessandro Sperduti

Abstract
Much recent research highlighted the critical role of unsupervised pre-training to improve the performance of neural network models. However, extensions of those architectures to the temporal domain introduce additional issues, which often prevent to obtain good performance in a reasonable time. We propose a novel approach to pre-train sequential neural networks in which a simpler, approximate distribution generated by a linear model is first used to drive the weights in a better region of the parameter space. After this smooth distribution has been learned, the network is fine-tuned on the more complex real dataset. The benefits of the proposed method are demonstrated on a prediction task using two datasets of polyphonic music, and the general validity of this strategy is shown by applying it to two different recurrent neural network architectures.

Manuscript from author [PDF]

ES2014-162

Exploiting similarity in system identification tasks with recurrent neural networks

Sigurd Spieckermann, Siegmund Düll, Steffen Udluft, Alexander Hentschel, Thomas Runkler

Abstract
A new dual-task learning approach based on recurrent neural networks with factored tensor components for system identification tasks is presented. The overall goal is to identify the underlying dynamics of a system given few observations which are complemented by auxiliary data from similar systems. The resulting system identification is motivated by various real-world industrial use cases, e.g. gas or wind turbine modeling for optimization and monitoring. The problem is formalized and the effectiveness of the proposed method is assessed on the cart-pole benchmark.

Manuscript from author [PDF]

ES2014-60

Comparison of local and global undirected graphical models

Zhemin Zhu, Djoerd Hiemstra, Peter Apers, Andreas Wombacher

Abstract
CRFs are discriminative undirected models which are globally normalized. Global normalization preserves CRFs from the label bias problem which most local models suffer from. Recently proposed co-occurrence rate networks (CRNs) are also discriminative undirected models. In contrast to CRFs, CRNs are locally normalized. It was established that CRNs are immune to the label bias problem even they are local models. In this paper, we further compare ECRNs (using fully empirical relative frequencies, not by support vector regression\footnote{This is different from our another later paper, in which we use support vector regression. }) and CRFs. The connection between Co-occurrence Rate, which is the exponential function of pointwise mutual information, and Copulas is built in continuous case. Also they are further evaluated statistically by experiments.

Manuscript from author [PDF]

ES2014-139

Meta Online Learning: Experiments on a Unit Commitment Problem

Jialin Liu, Olivier Teytaud

Abstract
Online learning is machine learning, in real time from successive data samples. Meta online learning consists in combining several online learning algorithms from a given set (termed portfolio) of algorithms. The goal can be (i) mitigating the effect of a bad choice of online learning algorithms (ii) parallelization (iii) combining the st rengths of different algorithms. We propose a methodology, termed lag, adapted to black-box online learning.

Manuscript from author [PDF]

ES2014-189

DELA: A Dynamic Online Ensemble Learning Algorithm

Hamid Bouchachia, Emili Balaguer-Ballester

Abstract
Abstract. The present paper investigates the problem of prediction in the context of dynamically changing environment, where data arrive over time. A Dynamic online Ensemble Learning Algorithm (DELA) is introduced. The adaptivity concerns three levels: structural adaptivity, combination adaptivity and model adaptivity. In particular, the structure of the ensemble is sought to evolve in order to be able to deal with the problem of data drift. The proposed online ensemble is evaluated on the stagger data set to show its predictive power in presence of data drift.

Manuscript from author [PDF]

[Back to Top]


Advances on Weightless Neural Systems


ES2014-7

Advances on Weightless Neural Systems

Massimo De Gregorio, Felipe M. G. França, Priscila M. V. Lima, Wilson R. de Oliveira

Abstract

Manuscript from author [PDF]

ES2014-51

Learning state prediction using a weightless neural explorer

Igor Aleksander, Helen Morton

Abstract
A weightless neural state machine acting as an exploratory automaton changes its position in a simulated toy world by its own actions. A popular question is asked: how might the automaton ‘become conscious of’ the effect of its own actions? Here we develop previously defined iconic learning in such weightless machines so that this knowledge can be achieved. Weightlessness, iconic learning are expressed in terms of state equations. Experimental results that show the conditions under which correct predictions can be obtained on a neural simulator are presented. Issues of information integration and memory implication are briefly considered at the end of the paper.

Manuscript from author [PDF]

ES2014-160

Can you follow that guy?

Mariacarla Staffa, Massimo De Gregorio, Maurizio Giordano, Silvia Rossi

Abstract

Manuscript from author [PDF]

ES2014-107

Credit analysis with a clustering RAM-based neural classifier

Douglas Cardoso, Danilo Carvalho, Daniel Alves, Diego Souza, Hugo Carneiro, Carlos Pedreira, Priscila M. V. Lima, Felipe M. G. França

Abstract
Datasets with a large amount of noisy data are quite common in real-world classification problems. Robustness is an important characteristic of state-of-the-art classifiers that use error minimization techniques, thus requiring a long time to converge. This paper presents ClusWiSARD, a clustering customization of the WiSARD weightless neural network model, applied to credit analysis, a non-trivial real-world problem. Experimental evidence shows that ClusWiSARD is very competitive with SVM in respect to accuracy. Nonetheless, ClusWiSARD outperforms SVM in both training time, being two orders of magnitude faster, and test time, being two to three times faster.

Manuscript from author [PDF]

ES2014-182

Training a classical weightless neural network in a quantum computer

Adenilton J. da Silva, Wilson R. de Oliveira, Teresa B. Ludermir

Abstract
The purpose of this paper is to investigate a new quantum learning algorithm for classical weightless neural networks. The learning algorithm creates a superposition of all possible neural network configurations for a given architecture. The performance of the network over the training set is stored entangled with neural configuration and quantum search is performed to amplify the probability amplitude of the network with desired performance.

Manuscript from author [PDF]

ES2014-167

Extracting rules from DRASiW’s "mental images"

Paulo Coutinho, Hugo Carneiro, Danilo Carvalho, Felipe M. G. França

Abstract
DRASiW is an extension of the WiSARD weightless neural model that provides the ability of producing examples/prototypes, called "mental images", from learnt categories. This work introduces a novel way of performing rule extraction by applying the WiSARD/DRASiW RAM-based neural model upon a well-known machine learning benchmark. A functional exploration is offered in order to demonstrate how the new rule extraction mechanism behaves under different system configurations. Experimental results suggest that the rules conformance to data increases proportionally to the corresponding classifier accuracy. Furthermore, comparison with C4.5 decision tree algorithm shows that the DRASiW-based technique produces more compact sets of rules.

Manuscript from author [PDF]

ES2014-176

Vector space weightless neural networks

Wilson R. de Oliveira, Adenilton J. da Silva, Teresa B. Ludermir

Abstract
By embedding the boolean space $Z_2$ as an orthonormal basis of vector space we can treat the RAM based neuron as a matrix (operator) acting the vector space. We show how this model (inspired by our research on quantum neural networks) is of sufficient generality as to have classical weighted (perceptron-like), classical weightless (RAM-based, PLN, etc), quantum weighted and quantum weightless neural models as particular cases. It is also indicated how one could use it to solve 3-SAT and briefly mention how could one train this novel model.

Manuscript from author [PDF]

ES2014-89

Online tracking of multiple objects using WiSARD

Rafael Lima de Carvalho, Danilo Carvalho, Priscila M. V. Lima, Félix Mora-Camino, Felipe M. G. França

Abstract
This paper evaluates the WiSARD weightless model as a classification system on the problem of tracking multiple objects in realtime. Exploring the structure of this model, the proposed solution applies a re-learning stage in order to avoid interferences caused by background noise or variations in the target shape. Once the tracker finds a target at the first time, it applies only local searches around the neighbourhood in order to have fast response. This approach is evaluated through some experiments on real-world video data.

Manuscript from author [PDF]

ES2014-83

Probabilistic automata simulation with single layer weightless neural networks

Adenilton J. da Silva, Wilson R. de Oliveira, Teresa B. Ludermir

Abstract
Computability of weightless neural networks is the major topic of this paper. In previous works it has been shown that, one can simulate a Turing machine with a weightless neural network (WNN) with an infinite tape. And it has also been shown that one can simulate probabilistic automata with a WNN with two queues. In this paper, we will show that is possible to simulate a probabilistic automata with a single layer WNN with no auxiliary data structures.

Manuscript from author [PDF]

[Back to Top]


Clustering


ES2014-163

Optimal Data Projection for Kernel Spectral Clustering

Diego H. Peluffo-Ordonez, Carlos Alzate, Suykens Johan, Germán Castellanos-Dominguez

Abstract
Spectral clustering has taken an important place in the context of pattern recognition, being a good alternative to solve problems with non-linearly separable groups. Because of its unsupervised nature, clustering methods are often parametric, requiring then some initial parameters. Thus, clustering performance is greatly dependent on the selection of those initial parameters. Furthermore, tuning such parameters is not an easy task when the initial data representation is not adequate. Here, we propose a new projection for input data to improve the cluster identification within a kernel spectral clustering framework. The proposed projection is done from a feature extraction formulation, in which a generalized distance involving the kernel matrix is used. Data projection shows to be useful for improving the performance of kernel spectral clustering.

Manuscript from author [PDF]

ES2014-119

Supporting GNG-based clustering with local input space histograms

Jochen Kerdels, Gabriele Peters

Abstract
This paper presents an extension to the growing neural gas (GNG) algorithm that allows to capture local characteristics of the input space. Using these characteristics clustering schemes based on the GNG network can be improved by discarding uncertain edges of the network and identifying edges that span discontinuous regions of input space. We applied the described approach to different two-dimensional data sets found in the literature and obtained comparable results.

Manuscript from author [PDF]

ES2014-104

The Sum-over-Forests clustering

Mathieu Senelle, Marco Saerens, François Fouss

Abstract
This work introduces a novel way to identify dense regions in a graph based on a mode-seeking clustering technique, relying on the Sum-Over-Forests (SoF) density index (which can easily be computed in closed form through a simple matrix inversion) as a local density estimator. We first identify the modes of the SoF density in the graph. Then, the nodes of the graph are assigned to the cluster corresponding to the nearest mode, according to a new kernel, also based on the SoF framework. Experiments on artificial and real datasets show that the proposed index performs well in nodes clustering.

Manuscript from author [PDF]

ES2014-46

Bayesian non-parametric parsimonious clustering

Faicel Chamroukhi, Marius Bartcus, Hervé Glotin

Abstract
This paper proposes a new Bayesian non-parametric approach for clustering. It relies on an infinite Gaussian mixture model with a Chinese Restaurant Process (CRP) prior, and an eigenvalue decomposition of the covariance matrix of each cluster. The CRP prior allows to control the model complexity in a principled way and to automatically learn the number of clusters. The covariance matrix decomposition allows to fit various parsimonious models going from simplest spherical ones to the more complex general one. We develop an MCMC Gibbs sampler to learn the models. First results obtained on both simulated and real data highlight the interest of the proposed infinite parsimonious mixture model.

Manuscript from author [PDF]

ES2014-98

Learning predictive partitions for continuous feature spaces

Björn Weghenkel, Laurenz Wiskott

Abstract
Any non-trivial agent (biological or algorithmical) that interacts with its environment needs some representation about its current state. Such a state should enable it to make informed decisions that lead to some desired outcome in the future. In practice, many learning algorithms assume states to come from a discrete set while real-world learning problems often are continuous in nature. We propose an unsupervised learning algorithm that finds discrete partitions of a continuous feature space that are predictive with respect to the future. More precisely, the learned partitions induce a Markov chain on the data with high mutual information between the current state and the next state. Such predictive partitions can serve as an alternative to classical discretization algorithms in cases where the predictable time-structure of the data is of importance.

Manuscript from author [PDF]

ES2014-164

An adjustable p-exponential clustering algorithm

Valmir Macario, Francisco de Assis Tenório De Carvalho

Abstract
This paper proposes a new exponential clustering algorithm (XPFCM) by reformulating the clustering's objective function with an additional paramter p to adjust the exponential behavior for membership assignment. The clustering experiments show that the proposed method assigns data to the clusters better than other fuzzy C-means (FCM) variants.

Manuscript from author [PDF]

ES2014-116

Self-organizing map for determination of goal candidates in mobile robot exploration

Jan Faigl, Peter Vanìk, Miroslav Kulich

Abstract
This paper addresses a problem of determining goal candidates in the frontier-based mobile robot exploration. The proposed solution is based on self-organizing map for the traveling salesman problem with neighborhoods and it allows to study the exploration formulated as a problem of repeated coverage of the current frontiers where the minimal number of goal candidates is determined simultaneously together with the expected cost to visit the candidates. The early results enabled by the proposed self-organizing map-based solution indicate exploration improvement for the proposed problem formulation. Thus, the presented work demonstrates how neural network approach can provide interesting insights and ground for studying optimizations problems arising in robotics.

Manuscript from author [PDF]

[Back to Top]


Regression, Forceasting and Extreme Learning Machines


ES2014-66

Parameter-free regularization in Extreme Learning Machines with affinity matrices

Leonardo Silvestre, André P. Lemos, João P. Braga, Antônio P. Braga

Abstract
This paper proposes a novel regularization approach for Extreme Learning Machines. Regularization is performed using a priori spacial information expressed by an affinity matrix. We show that the use of this type of a priori information is similar to perform Tikhonov regularization. Furthermore, if a parameter free affinity matrix is used, like the cosine similarity matrix, regularization is performed without any need for parameter tunning. Experiments are performed using classification problems to validate the proposed approach.

Manuscript from author [PDF]

ES2014-90

Feature selection in environmental data mining combining Simulated Annealing and Extreme Learning Machine

Michael Leuenberger, Mikhaïl Kanevski

Abstract
Due to the large amount and complexity of data available in geosciences, machine learning nowadays plays an important role in environmental data mining. In many real data cases, we face the need to design input space with the most relevant features. Because the main goal is to understand and find relationships between phenomena and features, feature selection is preferred to feature transformation or extraction. To deal with the high-dimensional space of environmental data, a wrapper method based on Extreme Learning Machine and global optimization algorithm (Simulated Annealing) is proposed. This paper investigates the whole methodology and shows promising results for environmental data feature selection and modelling.

Manuscript from author [PDF]

ES2014-79

Mobility Prediction Using Fully-Complex Extreme Learning Machines

Lahouari Ghouti

Abstract
Efficient planning and improved quality of service (QoS) in wireless networks call for the use of mobility prediction schemes. Such schemes ensure accurate mobility prediction of wireless users and units which plays a major role in optimized planning and management of the available bandwidth and power resources. In this paper, fully-complex extreme learning machines (CELMs), known for their universal approximation, model and predict the mobility patterns of arbitrary nodes in a mobile ad hoc network (MANET). Unlike their real-valued counterparts, CELMs properly capture the existing interaction/correlation between the nodes' location coordinates leading to more realistic and accurate prediction. Simulation results using standard mobility models and real-world mobility data clearly show that the proposed complex-valued prediction algorithm outperforms many existing real-valued learning machines in terms of prediction accuracy.

Manuscript from author [PDF]

ES2014-184

An Extreme Learning Approach to Active Learning

Euler Horta, Antônio P. Braga

Abstract
We propose in this paper a new active learning method that makes no considerations about the data distribution and does not need to adjust any free parameter. The proposed algorithm is based on extreme learning machines (ELM) and a perceptron with analytical calculation of weights. We show that the proposed model have good results using a reduced set of patterns.

Manuscript from author [PDF]

ES2014-180

A new model selection approach for the ELM network using metaheuristic optimization

Ananda Freire, Guilherme Barreto

Abstract
We propose a novel approach for architecture selection and hidden neurons excitability improvement for the Extreme Learning Machine (ELM). Named Adaptive Number of Hidden Neurons Approach (ANHNA), the proposed approach relies on a new general encoding scheme of the solution vector that automatically estimates the number of hidden neurons and adjust their activation function parameters (slopes and biases). Due to its general nature, ANHNA's encoding scheme can be used by any metaheuristic algorithm for continuous optimization. Computer experiments were carried out using Differential Evolution (DE) and Particle Swarm Optimization (PSO) metaheuristics, with promising results being achieved by the proposed method in benchmarking regression problems.

Manuscript from author [PDF]

ES2014-29

Extreme learning machines for Internet traffic classification

Joseph Ghafari, Emmanuel Herbert, Stephane Senecal, Daniel Migault, Stanislas Francfort, Ting Liu

Abstract
Network packet transport services (namely the Internet) are subject to significant security issues. This paper aims to apply Machine Learning methods based on Neural Networks (Extreme Learning Machines or ELM) to analyze the Internet traffic in order to detect specific malicious activities. This is performed by classifying traffic for a key service run over the internet: the Domain Name System (DNS). The ELM models and algorithms are run on DNS traffic data extracted from operating networks for botnet detection.

Manuscript from author [PDF]

ES2014-23

Electric load forecasting using wavelet transform and extreme learning machine

Song Li, Peng Wang, Lalit Goel

Abstract
This paper proposes a novel method for load forecast, which integrates wavelet transform and extreme learning machine. In order to capture more internal features, wavelet transform is used to decompose the load series into a set of sub-components, which are more predictable. Then all the components are separately processed by extreme learning machine. Numerical testing shows that the proposed method is able to improve the forecast performance with much less computational cost compared with other benchmarking methods.

Manuscript from author [PDF]

ES2014-133

Swim velocity profile identification through a Dynamic Self-adaptive Multiobjective Harmonic Search and RBF neural networks

Helon V. H. Ayala, Luciano F. da Cruz, Leandro Coelho, Roberto Z. Freire

Abstract
Technology has been successfully applied in sports, where biomechanical analysis is one of the most important area used to raise the performance of athletes. In this context, this paper focuses on swim velocity profile identification using Radial Basis Functions Neural Networks (RBF-NN) trained by the Gustafson-Kessel clustering combined with a novel Dynamic Self-adaptive Multiobjective Harmony Search (DS-MOHS). One study case is analyzed, from real data acquired of an elite female athlete, swimming breaststroke style. Better results are obtained by DS-MOHS when compared with standard multiobjective harmony search in terms of accuracy and generalization of the model.

Manuscript from author [PDF]

ES2014-173

Dynamic ensemble selection and instantaneous pruning for regression

Kaushala Dias, Terry Windeatt

Abstract
A novel dynamic method of selecting pruned ensembles of predictors for regression problems is presented. The proposed method, known henceforth as DESIP, enhances the prediction accuracy and generalization ability of pruning methods. Pruning heuristics attempt to combine accurate yet complementary members, therefore DESIP enhances the performance by modifying the pruned aggregation through distributing the ensemble member selection over the entire dataset. Four static ensemble pruning approaches used in regression are compared to highlight the performance improvement yielded by the dynamic method. Experimental comparison is made using Multiple Layer Perceptron predictors on benchmark datasets.

Manuscript from author [PDF]

ES2014-178

Data normalization and supervised learning to assess the condition of patients with multiple sclerosis based on gait analysis

Samir Azrour, Sébastien Piérard, Pierre Geurts, Marc Van Droogenbroeck

Abstract
Gait impairment is considered as an important feature of disability in multiple sclerosis but its evaluation in the clinical routine remains limited. In this paper, we assess, by means of supervised learning, the condition of patients with multiple sclerosis based on their gait descriptors obtained with a gait analysis system. As the morphological characteristics of individuals influence their gait while being in first approximation independent of the disease level, an original strategy of data normalization with respect to these characteristics is described and applied beforehand in order to obtain more reliable predictions. In addition, we explain how we address the problem of missing data which is a common issue in the field of clinical evaluation. Results show that, based on machine learning combined to the proposed data handling techniques, we can predict a score highly correlated with the condition of patients.

Manuscript from author [PDF]

ES2014-95

Sparse one hidden layer MLPs

Alberto Torres, David Díaz, José R. Dorronsoro

Abstract
We discuss how to build sparse one hidden layer MLP replac- ing the standard l2 weight decay penalty on all weights by an l1 penalty on the linear output weights. We will propose an iterative two step training procedure where the output weights are found using FISTA proximal op- timization algorithm to solve a Lasso-like problem and the hidden weights are computed by unconstrained minimization. As we shall discuss, the procedure has a complexity equivalent to that of standard MLP training, yields MLPs with similar performance and, as a by product, automatically selects the number of hidden units.

Manuscript from author [PDF]

ES2014-37

Multi-Step Ahead Forecasting of Road Condition Using Least Squares Support Vector Regression

Konsta Sirvio, Jaakko Hollmén

Abstract
Network-level multi-step road condition forecasting is an important step in accurate road maintenance planning, where correct maintenance activities are defined in place and time of road networks. Forecasting methods have developed from engineering models to non-linear machine learning methods that make use of the collected condition and traffic data of the road network. Least Squares Support Vector Regression gives signi cantly the best results.

Manuscript from author [PDF]

[Back to Top]


Label noise in classification


ES2014-10

A comprehensive introduction to label noise

Benoît Frénay, Ata Kaban

Abstract
In classification, it is often difficult or expensive to obtain completely accurate and reliable labels. Indeed, labels may be polluted by label noise, due to e.g. insufficient information, expert mistakes, and encoding errors. The problem is that errors in training labels that are not properly handled may deteriorate the accuracy of subsequent predictions, among other effects. Many works have been devoted to label noise and this paper provides a concise and comprehensive introduction to this research topic. In particular, it reviews the types of label noise, their consequences and a number of state of the art approaches to deal with label noise.

Manuscript from author [PDF]

ES2014-185

Improving the Robustness of Bagging with Reduced Sampling Size

Maryam Sabzevari, Gonzalo Martínez-Muñoz, Alberto Suárez

Abstract
Bagging is a simple and robust classification algorithm in the presence of class label noise. This algorithm builds an ensemble of classifiers by bootstrapping samples with replacement of size equal to the original training set. However, several studies have shown that this choice of sampling size is arbitrary in terms of generalization performance of the ensemble. In this study we discuss how small sampling ratios can contribute to the robustness of bagging in the presence of class label noise. An empirical analysis on two datasets is carried out using different noise rates and bootstrap sampling sizes. The results show that, for the studied datasets, sampling rates of 20% clearly improve the performance of the bagging ensembles in the presence of class label noise.

Manuscript from author [PDF]

ES2014-72

Credal decision trees in noisy domains

Carlos J. Mantas, Joaquín Abellán

Abstract
Credal Decision Trees (CDTs) are algorithms to design classifiers based on imprecise probabilities and uncertainty measures. In this paper, the C4.5 and CDT procedures are combined in a new one. This depends on a parameter $s$. Several experiments are carried out with different values for $s$. C4.5 and the new procedure, varying the value of its parameter, are compared on data sets with different noise levels.

Manuscript from author [PDF]

ES2014-18

Finding Originally Mislabels with MD-ELM

Anton Akusok, David Veganzones, Yoan Miché, Eric Severin, Amaury Lendasse

Abstract
This paper presents a method which aims at detecting mislabeled samples, with a practical example in the eld of bankruptcy prediction. Mislabeled samples are found in many classi cation problems and can bias the training of the desired classi er. This paper is proposing a new method based on Extreme Learning Machine (ELM) which allows for identification of the most probable mislabeled samples. Two datasets are used in order to validate and test the proposed methodology: a toy example (XOR problem) and a real dataset from corporate nance (bankruptcy prediction).

Manuscript from author [PDF]

ES2014-118

Misclassification of class C G-protein-coupled receptors as a label noise problem

Caroline König, Alfredo Vellido Alcacena, René Alquézar Mancho, Jesús Giraldo

Abstract
G-Protein-Coupled Receptors (GPCRs) are cell membrane proteins of relevance to biology and pharmacology. Their supervised classification in subtypes is hampered by label noise, which stems from a combination of expert knowledge limitations and lack of clear correspondence between labels and different representations of the protein primary sequences. In this brief study, we describe a systematic approach to the analysis of GPCR misclassifications using Support Vector Machines and use it to assist the discovery of database labeling quality problems and investigate the extent to which GPCR sequence physicochemical transformations reflect GPCR subtype labeling. The proposed approach could enable a filtering approach to the label noise problem.

Manuscript from author [PDF]

ES2014-169

A multi-class extension for multi-labeler support vector machines

Diego H. Peluffo-Ordonez, Santiago Murillo-Rendón, Julián Arias-Londoño, Germán Castellanos-Dominguez

Abstract
In recent years, there has been an increasing interest in the design of pattern recognition systems able to deal with labels coming from multiple sources. To avoid bias during the learning process, in some applications it is strongly recommended to learn from a set of panelists or experts instead of only one. In particular, two aspects are of interest, namely: discriminating between confident and unconfident labelers, and determining the suitable ground truth. This work presents an extension of a previous work, which consists of a generalization of the two class case via a modified one-against-all approach. This approach uses modified classifiers able to learn from multi-labeler settings. This is done within a soft-margin support vector machine framework. Proposed method provides ranking values for panelist as well as an estimate of the ground truth.

Manuscript from author [PDF]

[Back to Top]