ESANN2013

21st European Symposium on Artificial Neural Networks
Bruges, Belgium, April 24-25-26

[Electronic proceedings home page] [Electronic proceedings author index]

ESANN2013
Content of the proceedings

WARNING: you need Adobe Acrobat reader 7.0 or more to view the PDF files below



Machine Learning Methods for Processing and Analysis of Hyperspectral Data


ES2013-9

Processing Hyperspectral Data in Machine Learning

Thomas Villmann, Marika Kästner, Andreas Backhaus, Udo Seiffert

Abstract
The adaptive and automated analysis of hyperspectral data is mandatory in many areas of research such as physics, astronomy and geophysics, chemistry, bioinformatics, medicine, biochemistry, engineering, and others. Hyperspectra differ from other spectral data that a large frequency range is uniformly sampled. The resulting discretized spectra have a huge number of spectral bands and can be seen as good approximations of the underlying continuous spectra. The large dimensionality causes numerical difficulties in efficient data analysis. Another aspect to deal with is that the amount of data may range from several billion samples in geophysics to only a few in medical applications. In consequence, dedicated machine learning algorithms and approaches are required for precise while efficient processing of hyperspectral data, which should include also expert knowledge of the application domain as well as mathematical properties of the hyperspectral data.

Manuscript from author [PDF]

ES2013-34

Multi-view feature extraction for hyperspectral image classification

Michele Volpi, Giona Matasci, Mikhaïl Kanevski, Devis Tuia

Abstract
We study the multi-view feature extraction (MV-FE) framework for the classification of hyperspectral images acquired from airborne and spaceborne sensors. This type of data is naturally composed by distinct blocks of spectral channels, forming the hypercube. To reduce the dimensionality of the data by taking advantage of this particular structure, an unsupervised multi-view feature extraction method is applied prior to classification. First, a technique to automatically obtain the blocks, based on the global spectral correlation matrix, is applied. Then, the kernel canonical correlation analysis is performed in a multi-view setting (MV-kCCA) to find projections of the data blocks in a correlated subspace, gaining thus discriminant power. Experiments using the linear discriminant classifier (LDA) show the appropriateness of adopting a MV-FE prior to classification, which outperforms standard approaches.

Manuscript from author [PDF]

ES2013-54

Regularization in relevance learning vector quantization using l1-norms

Martin Riedel, Fabrice Rossi, Marika Kästner, Thomas Villmann

Abstract
We propose in this contribution a method for $l_{1}$-regularization in prototype based relevance learning vector quantization (LVQ) for sparse relevance profiles. Sparse relevance profiles in hyperspectral data analysis fade down those spectral bands which are not necessary for classification. In particular, we consider the sparsity in the relevance profile enforced by LASSO optimization. The latter one is obtained by a gradient learning scheme using a differentiable parametrized approximation of the $l_{1}$-norm, which has an upper error bound. We extend this regularization idea also to the matrix learning variant of LVQ as the natural generalization of relevance learning.

Manuscript from author [PDF]

[Back to Top]


Recurrent networks and modeling


ES2013-4

Mixed order associative networks for function approximation, optimisation and sampling

Kevin Swingler, Leslie Smith

Abstract
A mixed order associative neural network with n neurons and a modified Hebbian learning rule can learn any function f:{-1,1}^n = R and reproduce its output as the network's energy function. The network weights are equal to Walsh coefficients, the fixed point attractors are local maxima in the function, and partial sums across the weights of the network calculate averages for hyperplanes through the function. If the network is trained on data sampled from a distribution, then marginal and conditional probability calculations may be made and samples from the distribution generated from the network. These qualities make the network ideal for optimisation fitness function modelling and make the relationships amongst variables explicit in a way that architectures such as the MLP do not.

Manuscript from author [PDF]

ES2013-50

Auto-encoder pre-training of segmented-memory recurrent neural networks

Stefan Glüge, Ronald Böck, Andreas Wendemuth

Abstract
The extended Backpropagation Through Time (eBPTT) learning algorithm for Segmented-Memory Recurrent Neural Networks (SMRNNs) yet lacks the ability to reliably learn long-term dependencies. The alternative learning algorithm, extended Real-Time Recurrent Learning (eRTRL), does not suffer from this problem but is computationally very inefficient, such that it is impractical for the training of large networks. The positive results reported with the pre-training of deep neural networks give rise to the hope that SMRNNs could also benefit from a pre-training procedure. In this paper, we introduce a layer-local pre-training procedure for SMRNNs. Using the information latching problem as a benchmark task, the comparison of randomly initialised and pre-trained networks shows the beneficial effect of the unsupervised pre-training. It significantly improves the learning of long-term dependencies in the supervised eBPTT training.

Manuscript from author [PDF]

ES2013-47

Error entropy criterion in echo state network training

Levy Boccato, Daniel G. Silva, Denis Fantinato, Kenji Nose Filho, Rafael Ferrari, Romis Attux, Aline Neves, Jugurta Montalvão, João Marcos T. Romano

Abstract
Echo state networks offer a promising possibility for an effective use of recurrent structures as the presence of feedback is accompanied with a relatively simple training process. However, such simplicity, which is obtained through the use of an adaptive linear readout that minimizes the mean-squared error, limits the capability of exploring the statistical information of the involved signals. In this work, we apply an information-theoretic learning framework, based on the error entropy criterion, to the ESN training, in order to improve the performance of the neural model, whose advantages are analyzed in the context of supervised channel equalization problem.

Manuscript from author [PDF]

ES2013-94

Perceptual grouping through competition in coupled oscillator networks

Martin Meier, Robert Haschke, Helge Ritter

Abstract
In this paper we present a novel approach to model perceptual grouping based on synchronization in a network of coupled oscillators. To this end, the concept of excitatory and inhibitory connections between recurrent neurons is transfered from the Competitive Layer Model to a network of Kuramoto oscillators, which realizes grouping by phase and frequency synchronization. While preserving the excellent grouping capabilities of the CLM, this approach boosts the computational performance (due its simplicity), which is verified in several experiments.

Manuscript from author [PDF]

ES2013-106

Using Wikipedia with associative networks for document classification

Niels Bloom, Mariet Theune, Franciska de Jong

Abstract
We demonstrate a new technique for building associative networks based on Wikipedia, comparing them to WordNet-based associative networks that we used previously, finding the Wikipedia-based networks to perform better at document classification. Additionally, we compare the performance of associative networks to various other text classi cation techniques using the Reuters-21578 dataset, establishing that associative networks can achieve comparable results.

Manuscript from author [PDF]

ES2013-7

Automated operational states detection for drilling systems control in critical conditions

Galina Veres, Zoheir Sabeur

Abstract
Critical events in industrial drilling should be overcome by engineers while they maintain safety and achieve their operational drilling plans. Complex geophysical drilling requires maximum awareness of critical situations such as “Kicks”, “Fluid loss” or “Stuck pipe”. These may compromise safety and potentially halt operations with the need of staff evacuations from rigs rapidly. In this paper, a robust method for the detection of operational states is proposed. Specifically, Echo State Networks (ESNs) were benchmarked and tested rigorously despite of the challenging training datasets that exhibited imbalance problem issues. These issues were overcome and led to good ESNs performances.

Manuscript from author [PDF]

ES2013-27

Analysis of Synaptic Weight Distribution in an Izhikevich Network

Li Guo, Zhijun Yang, Qingbao Zhu

Abstract
Izhikevich network is a relatively new neuronal network, which consists of cortical spiking model neurons with axonal conduction delays and spike-timing-dependent plasticity (STDP) with hard bound adaptation. In this work, we use uniform and Gaussian distributions respectively to initialize the weights of all excitatory neurons. After the network undergoes a few minutes of STDP adaptation, we can see that the weights of all synapses in the network, for both initial weight distributions, form a bimodal distribution, and numerically the established distribution presents dynamic stability.

Manuscript from author [PDF]

ES2013-23

Percolation model of axon guidance

Gaetano Liborio Aiello, Valentino Romano

Abstract
In the developing brain neurons interconnect via the action of molecules that guide the axon to its targets, thus allowing the proper wiring scheme to emerge. It is not fully understood whether the underlying mechanism is wholly deterministic or not. The existence of “choice-points” and “decision-regions” suggest that options are available to the growth cone. The guidance mechanism is here simulated by equating the axonal trajectory to that of a trickle of ground water sipping through a bed of sand. Decision regions are implemented by assigning each site of the percolation lattice a set of probabilities ruling the possible moves.

Manuscript from author [PDF]

ES2013-5

Efficient VLSI Architecture for Spike Sorting Based on Generalized Hebbian Algorithm

Wen-Jyi Hwang, Hao Chen

Abstract
A novel hardware architecture for fast spike sorting is presented in this paper. The architecture is able to perform feature extraction based on the Generalized Hebbian Algorithm (GHA). The employment of GHA allows efficient computation of principal components for subsequent clustering and classification operations. The hardware implementations of GHA features high throughput, low power dissipation, and low area costs. The proposed architecture is implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip(SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining low hardware resource utilization and high speed computation.

Manuscript from author [PDF]

[Back to Top]


Dimensionality reduction


ES2013-46

Soft rank neighbor embeddings

Marc Strickert, Kerstin Bunte

Abstract
Correlation-based multidimensional scaling is proposed for reconstructing pairwise dissimilarity or score relationships in a Euclidean space. Pearson correlation between pairs of objects in source and target space can be directly maximized by gradient methods, while gradient optimization of Spearman rank correlation profits from a numerically soft formulation introduced in this work. Scale and shift invariance properties of correlation help circumventing typical distance concentration problems.

Manuscript from author [PDF]

ES2013-99

Multiple Kernel Self-Organizing Maps

Madalina Olteanu, Nathalie Villa-Vialaneix, Christine Cierco-Ayrolles

Abstract
In a number of real-life applications, the user is interested in analyzing several sources of information together: a graph together with additional information known on its nodes, numerical variables measured on individuals together with factors describing these individuals... The combination of all the sources of information can help him to better understand the dataset in its whole. The present article focuses on such an issue, by using self-organizing maps. Using a kernel version of the algorithm makes it possible to combine various types of information (graph, numerical values, factors, strings...) and to automatically find a good trade-off between all sources of data, but using an automated procedure to tune the data combination. This approach is illustrated on several examples.

Manuscript from author [PDF]

ES2013-38

Semi-Supervised Vector Quantization for proximity data

Xibin Zhu, Frank-Michael Schleif, Barbara Hammer

Abstract
Semi-supervised learning (SSL) is focused on learning from labeled and unlabeled data by incorporating structural and statistical information of the available unlabeled data. The amount of data is dramatically increasing, but few of them are fully labeled, due to cost and time constraints. Even more challenging are non-vectorial, so called proximity data, with data given by pairwise proximity values, like score-values in sequence alignments, having no regular vector-space representation. Only few methods provide SSL for this data, limited to positive-semi-definite (psd) data. They also lack interpretable models, which is a relevant aspect in life-sciences where most of these data are found. This paper provides a prototype based SSL approach for proximity data.

Manuscript from author [PDF]

ES2013-66

Sensitivity to parameter and data variations in dimensionality reduction techniques

Francisco J. García-Fernández, Michel Verleysen, John A. Lee, Ignacio Díaz

Abstract
Dimensionality reduction techniques aim at representing high-dimensional data in a meaningful and lower dimensional space, improving the human comprehension and interpretation of data. In recent years, newer nonlinear techniques have been proposed in order to address the limitation of linear techniques. This paper presents a study of the stability of some of these dimensionality reduction techniques, analyzing their behavior under changes in the parameters and the data. The performances of these techniques are investigated on artificial datasets. The paper presents these results by identifying the weaknesses of each technique, and suggests some data-processing tasks to improve the stability.

Manuscript from author [PDF]

[Back to Top]


Image, signal and time series analysis


ES2013-112

A nuclear-norm based convex formulation for informed source separation

Augustin Lefèvre, François Glineur, P.A. Absil

Abstract
Abstract. We study the problem of separating audio sources from a single linear mixture. The goal is to find a decomposition of the single channel spectrogram into a sum of individual contributions associated to a certain number of sources. In this paper, we consider an informed source separation problem in which the input spectrogram is partly annotated. We propose a convex formulation that relies on a nuclear norm penalty to induce low rank for the contributions. We show experimentally that solving this model with a simple subgradient method outperforms a previ- ously introduced nonnegative matrix factorization (NMF) technique, both in terms of source separation quality and computation time.

Manuscript from author [PDF]

ES2013-56

Frequency-Dependent Peak-Over-Threshold algorithm for fault detection in the spectral domain

Aurélien Hazan, Kurosh Madani

Abstract
An original novelty detection algorithm in the Fourier domain, using extreme value theory (EVT) is considered in this article. Periodograms may be considered as frequency-dependent random variables, and this can be taken into account when designing statistical tests. Frequency-Dependent Peak-Over-Threshold (FDPOT) puts special emphasis on the frequency dependence of extreme value statistics, thanks to Vector Generalized Additive Models (VGAM) estimation. An application is discussed in the field of mechanical vibrations. It is first shown that performance increases compared to POT detection. Then FDPOT is compared to state-of-the-art algorithms such as KPCA.

Manuscript from author [PDF]

ES2013-82

Activity Date Estimation in Timestamped Interaction Networks

Fabrice Rossi, Pierre Latouche

Abstract
We propose in this paper a new generative model for graphs that uses a latent space approach to explain timestamped interactions. The model is designed to provide global estimates of activity dates in historical networks where only the interaction dates between agents are known with reasonable precision. Experimental results show that the model provides better results than local averages in dense enough networks.

Manuscript from author [PDF]

ES2013-60

Novelty detection in image recognition using IRF Neural Networks properties

Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban

Abstract
Image Receptive Fields Neural Network (IRF-NN) is a variant of feedforward multi-layer perceptrons adapted to image recognition. It shows very fast training as well as robust and accurate results on supervised classification tasks. This paper presents another property of IRF-NN: responses of trained networks can be analysed to detect unknown images. Several discriminative and efficient novelty criteria are introduced and tested successfully on the ALOI image dataset. A combination of novelty detection and object recognition is illustrated with a robust, pose invariant application of multi-object localization in various backgrounds

Manuscript from author [PDF]

ES2013-42

Non-Euclidean independent component analysis and Oja's learning

Mandy Lange, Michael Biehl, Thomas Villmann

Abstract
In the present contribution we tackle the problem of nonlinear independent component analysis by non-Euclidean Hebbian-like learning. Independent component analysis (ICA) and blind source separation originally were introduced as tools for the linear unmixing of the signals to detect the underlying sources. Hebbian methods became very popular and succesfully in this context. Many nonlinear ICA extensions are known. A promising strategy is the application of kernel mapping. Kernel mapping realizes an usually nonlinear but implicite data mapping of the data into a reproducing kernel Hilbert space. After that a linear demixing can be carried out there. However, explicit handling in this non-Euclidean kernel mapping space is impossible. We show in this paper an alternative using an isomorphic mapping space. In particular, we show that the idea of Hebbian-like learning of \emph{kernel }ICA can be transferred to this non-Euclidean space realizing an non-Euclidean ICA.

Manuscript from author [PDF]

ES2013-48

Automatic Singular Spectrum Analysis for Time-Series Decomposition

Andres Marino Alvarez-Meza, Carlos Daniel Acosta-Medina, Germán Castellanos-Dominguez

Abstract
An automatic singular spectrum analysis - SSA based methodology is proposed to decompose and reconstruct time-series. We suggest a clustering based procedure to decompose the main dynamics of the input signal. A subset of orthogonal basis computed from the input are selected using a power based criterion. Then, the subset of basis are represented by a discrete fourier transform, to identify basis encoding similar data structures, which are employed to infer the hidden components of the signal. Our approach is tested over some synthetic and real-world datasets, showing that our algorithm is a good tool to interpret and decomposes time-series.

Manuscript from author [PDF]

ES2013-64

Dimension reduction for individual ica to decompose FMRI during real-world experiences: principal component analysis vs. canonical correlation analysis

Valeri Tsatsishvili, Fengyu Cong, Tuomas Puoliväli, Vinoo Alluri, Petri Toiviainen, Asoke K. Nandi, Elvira Brattico, Tapani Ristaniemi

Abstract
Data analysis for functional magnetic resonance imaging collected during real-world experiences is critical. Independent component analysis (ICA) has been used to extract desired spatial maps. Before ICA, dimension reduction is used to separate the signal and the noise subspaces. Recently, in addition to the widely used Principal component analysis (PCA) and model order selection, canonical correlation analysis (CCA) has been exploited to find the correlated and uncorrelated subspaces between two datasets. This study compares CCA and PCA for dimension reduction for ICA to decompose very noisy fMRI elicited by natural and continuous music. We find that their performances are comparable.

Manuscript from author [PDF]

ES2013-45

Machine Learning Techniques for Short-Term Electric Power Demand Prediction

Fernando Mateo, Juan J. Carrasco, Mónica Millán-Giraldo, Abderrahim Sellami, Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas

Abstract
Since several years ago, power consumption forecast has attracted considerable attention from the scientific community. Although there exist several works that deal with this issue, it remains open. The good management of energy consumption in HVAC (Heating, Ventilation and Air Conditioning ) systems for large households and public buildings may benefit from a sustainable development in terms of economy and environmental preservation. In this paper, several Machine Learning techniques are evaluated and compared with a linear technique (Robust Multiple Linear Regression) and a naïve method. All methods have been applied to five buildings of the University of León (Spain), the results indicate nonlinear techniques outperform the linear one in most scenarios.

Manuscript from author [PDF]

ES2013-6

Unsupervised non-linear neural networks capture aspects of floral choice behaviour

Levente Orbán, Sylvain Chartier

Abstract
Two unsupervised neural networks were tested to understand the extent to which they capture elements of bumblebees’ unlearned preferences towards flower-like visual properties. The networks, which are based on Independent Component Analysis and Feature-Extracting Bidirectional Associative Memory use images of test-patterns that are identical to ones used in behavioural studies. While both models show consistency with behavioural results, the ICA model matches behavioural results sub- stantially better in terms of image reconstruction quality of radial and concentric patterns, and foliage background. Both models generated a novel prediction of an interaction between spatial frequency and symmetry. These results are interpreted to support the hypothesis that flower displays are adapted to pollinators’ information processing constraints.

Manuscript from author [PDF]

[Back to Top]


Feature selection


ES2013-117

GA-KDE-Bayes: an evolutionary wrapper method based on non-parametric density estimation applied to bioinformatics problems

Maria Fernanda Wanderley, Vincent Gardeux, René Natowicz, Antônio Braga

Abstract
This paper presents an evolutionary wrapper method for feature selection that uses a non-parametric density estimation method and a Bayesian Classifier. Non-parametric methods are a good alternative for scarce and sparse data, as in Bioinformatics problems, since they do not make any assumptions about its structure and all the information come from data itself. Results show that local modeling provides small and relevant subsets of features when comparing to results available on literature.

Manuscript from author [PDF]

ES2013-77

Risk Estimation and Feature Selection

Gauthier Doquire, Benoît Frénay, Michel Verleysen

Abstract
For classification problems, the risk is often the criterion to be eventually minimised. It can thus naturally be used to assess the quality of feature subsets in feature selection. However, in practice, the probability of error is often unkwown and must be estimated. Also, mutual information is often used as a criterion to assess the quality of feature subsets, since it can be seen as an imperfect proxy for the risk and can be reliably estimated. In this paper, two different ways to estimate the risk using the Kozachenko-Leonenko probability density estimator are proposed. The resulting estimators are compared on feature selection problems with a mutual information estimator based on the same density estimator. Along the line of our previous works, experiments show that using an estimator of either the risk or the mutual information give similar results.

Manuscript from author [PDF]

ES2013-67

Random Brains: An ensemble method for feature selection with neural networks

Mark Embrechts, Jonathan Linton, Jorge Santos

Abstract
The purpose of this paper is to introduce and validate Random Brains, a novel artificial neural network based feature selection technique. Feature selection is widely used in high-dimensional data and it aims on removing irrelevant or redundant data, providing faster predictors without a significant decrease in model performance. Random Brains, inspired by Breiman’s Random Forests, are bagged ensembles of predictive neural network models that use randomly selected subsets of features. This paper validates Random Brains on several classification and regression benchmark data sets by comparing its performance to similar models with features selected based on sensitivity analysis.

Manuscript from author [PDF]

ES2013-41

A distributed wrapper approach for feature selection

Veronica Bolon-Canedo, Noelia Sánchez-Maroño, Amparo Alonso-Betanzos

Abstract
In recent years, distributed learning has been the focus of much attention due to the proliferation of big databases, usually distributed. In this context, machine learning can take advantage of feature selection methods to deal with these datasets of high dimensionality. However, the great majority of current feature selection algorithms are designed for centralized learning. To confront the problem of distributed feature selection, in this paper we propose a distributed wrapper approach. In this manner, the learning accuracy can be improved, as well as obtaining a reduction in the memory requirements and execution time. Four representative datasets were selected to test the approach, paving the way to its application over extremely-high data which prevented previously the use of wrapper approaches.

Manuscript from author [PDF]

ES2013-52

Feature Selection for Footwear Shape Estimation

Fernando Mateo, Mónica Millán-Giraldo, Juan J. Carrasco, Enrique Montiel, Jose A. Bernabeu, José D. Martín-Guerrero

Abstract
This study proposes feature selection techniques to obtain a set of significant foot anthropometric measurements that can assist custumers in the choice of footwear size and width. The results given by a number of methods are averaged to provide a reliable set of features. Several machine learning methods are used to evaluate the classification (for the width) and regression (for the size) accuracies before and after feature selection. The results prove the benefits of carrying out feature selection, especially for the shoe width.

Manuscript from author [PDF]

ES2013-116

Efficient prediction of x-axis intercepts of discrete impedance spectra

Thomas Schmid, Dorothee Günzel, Martin Bogdan

Abstract
In impedance spectroscopy of epithelial cell layers, it is a common task to extrapolate discrete two-dimensional plots in order to determine electrical properties associated with axis intercepts. Here, we investigate how implicit properties of such curves can be used to predict the x-axis intercept where explicitly determined properties fail to do so. We perform feature extraction, algorithmic feature ranking and dimension reduction on model impedance spectra derived from a tissue-equivalent electric circuit. Selected feature subsets are assessed by training artificial neural networks to predict the intercept. Results show that subsets of three or less implicit features provide a reasonable basis for predictions.

Manuscript from author [PDF]

ES2013-58

Evolutionary computation based system decomposition with neural networks

Robert Kaltenhaeuser, Erik Schaffernicht, Frank-Florian Steege, Horst-Michael Gross

Abstract
We present an evolutionary approach to divide a complex control system into smaller sub-systems with the help of neural networks. Thereto, measured channels are partitioned into several disjunct sets, representing possible sub-problems, while the networks are used to assess the quality of the resulting decomposition. We show that this approach is well suited to calculate correct decompositions of complex control systems. Furthermore, the obtained neural networks are used to predict important process factors with considerable better approximation quality than monolithic approaches that have to deal with all input channels in parallel.

Manuscript from author [PDF]

[Back to Top]


Reinforcement learning, control and optimization


ES2013-100

Fast online adaptivity with policy gradient: example of the BCI ``P300''-speller

Emmanuel Daucé, Timothée Proix, Liva Ralaivola

Abstract
We tackle the problem of reward-based online learning of multiclass classifiers and consider a policy gradient ascent to solve this problem in the linear case. We apply it to the online adaptation of an EEG-based ``P300''-speller. When applied from scratch, a robust classifier is obtained in few steps.

Manuscript from author [PDF]

ES2013-73

Locally Weighted Least Squares Temporal Difference Learning

Matthew Howard, Yoshihiko Nakamura

Abstract
This paper introduces locally weighted temporal difference learning for evaluation of a class of policies whose value function is non-linear in the state. Least squares temporal difference learning is used for training local models according to a distance metric in state-space. Empirical evaluations are reported demonstrating learning performance on a number of strongly non-linear value functions, without the need for prior knowledge of features or a specific functional form.

Manuscript from author [PDF]

ES2013-26

Learning control under uncertainty: A probabilistic Value-Iteration approach

Bastian Bischoff, Duy Nguyen-Tuong, Heiner Markert, Alois Knoll

Abstract
In this paper, we introduce a probabilistic version of the well-studied Value-Iteration approach, i.e. Probabilistic Value-Iteration (PVI). The PVI approach can handle continuous states and actions in an episodic Reinforcement Learning (RL) setting, while using Gaussian Processes to model the state uncertainties. We further show, how the approach can be efficiently realized making it suitable for learning with large data. The proposed PVI is evaluated on a benchmark problem, as well as on a real robot for learning a control task. A comparison of PVI with two state-of-the-art RL algorithms shows that the proposed approach is competitive in performance while being efficient in learning.

Manuscript from author [PDF]

ES2013-93

Ensembles for Continuous Actions in Reinforcement Learning

Siegmund Duell, Steffen Udluft

Abstract
Data efficient reinforcement learning methods allow to optimize controllers (policies) for complex technical systems in a data-driven manner. Still there is the risk that, when running such a policy on the real system, it performs considerably worse than expected. For policies with discrete actions it has been shown, that this risk can be reduced considerably, when, instead of just using a single policy, that by chance might be inferior, a whole ensemble of policies is used to select the final policy by an aggregation like, e.g., majority voting. In this paper we extend the applicability of the ensemble approach to vector-valued, continuous actions.

Manuscript from author [PDF]

ES2013-68

An empirical analysis of reinforcement learning using design of experiments

Christopher Gatti, Mark Embrechts, Jonathan Linton

Abstract
This study uses a design of experiments approach to understand the behavior of a neural network to learn the mountain car domain using reinforcement learning. A large experiment is first performed to characterize the probability of empirical convergence based on three reinforcement learning algorithm parameters (λ, γ, ε), and a logistic regression model is fitted to this data. A detailed analysis of a subset of the parameter space finds that, upon convergence, algorithm parameters have significant effects on the convergence speed and mean performance, though performance differences are minimal.

Manuscript from author [PDF]

ES2013-19

Hierarchical Reinforcement Learning for Robot Navigation

Bastian Bischoff, Duy Nguyen-Tuong, I-Hsuan Lee, Felix Streichert, Alois Knoll

Abstract
For complex tasks, such as manipulation and robot navigation, reinforcement learning (RL) is well-known to be difficult due to the curse of dimensionality. To overcome this complexity and making RL feasible, hierarchical RL (HRL) has been suggested. The basic idea of HRL is to divide the original task into elementary subtasks, which can be learned using RL. In this paper, we propose a HRL architecture for learning robot's movements, e.g. robot navigation. The proposed HRL consists of two layers: (i) movement planning and (ii) movement execution. In the planning layer, e.g. generating navigation trajectories, discrete RL is employed while using movement primitives. Given the movement planning and corresponding primitives, the policy for the movement execution can be learned in the second layer using continuous RL. The proposed approach is implemented and evaluated on a mobile robot platform for a navigation task.

Manuscript from author [PDF]

ES2013-2

Least-squares temporal difference learning based on extreme learning machine

Pablo Escandell-Montero, José M. Martínez-Martínez, José D. Martín-Guerrero, Emilio Soria-Olivas, Juan Gómez-Sanchis

Abstract
This paper proposes a least-squares temporal difference (LSTD) algorithm based on extreme learning machine that uses a single-hidden layer feedforward network to approximate the value function. While LSTD is typically combined with local function approximators, the proposed approach uses a global approximator that allows better scalability properties. The results of the experiments carried out on four Markov decision processes show the usefulness of the proposed approach.

Manuscript from author [PDF]

ES2013-91

Binary particle swarm optimisation with improved scaling behaviour

Denise Gorse

Abstract
A boolean particle swarm optimisation (PSO) algorithm is presented that builds on the strengths of earlier proposals but which by introducing a wholly random element into the search process shows greatly improved performance in higher dimensional search spaces in comparison also to the binary PSO algorithm of Kennedy and Eberhart.

Manuscript from author [PDF]

ES2013-62

Dynamic Placement with Connectivity for RSNs based on a Primal-Dual Neural Network

Rafael Lima Carvalho, Lunlong Zhong, Felipe França, Félix Mora-Camino

Abstract
The present work deals with the dynamic placement of a set of pursuers and a set of relay devices so that the mean distance to a set of moving targets is minimized along a given period of time. The relay devices are here in charge of maintaining the communication between the pursuers. Moving targets, relay devices and pursuers are limited in their movements from one period to the next. The periodic problem is formulated as a linear quadratic programming model and a primal-dual neural network is proposed to solve from one stage to the next the current optimization problem. Moreover, the feasibility of the proposed approach is displayed through a numerical example.

Manuscript from author [PDF]

[Back to Top]


Machine Learning for multimedia applications


ES2013-13

Machine Learning and Content-Based Multimedia Retrieval

Philippe-Henri Gosselin, David Picard

Abstract

Manuscript from author [PDF]

ES2013-109

Learning associative spatiotemporal features with non-negative sparse coding

Thomas Guthier, Steve Gerges, Volker Willert, Julian Eggert

Abstract
Motion features based on optical flow are very powerful in tasks such as the recognition of human actions or gestures. Usually, they are combined with gradient information to form a set of spatiotemporal features. However, humans can recognize gestures and actions and thus derive the implied motion out of static images alone. We model this associative recognition within a learned hierarchy of non-negative sparse coding layers. In the first stages, topology preserving gradient and motion features are processed separately. Afterwards, they are projected onto a combined inner representation, that is learned during the training phase. We show, that during recognition the learned, combined representation improves the recognition of human actions, even in the absence of explicit motion information.

Manuscript from author [PDF]

ES2013-111

Content-based image retrieval with hierarchical Gaussian Process bandits with self-organizing maps

Ksenia Konyushkova, Dorota Glowacka

Abstract
A content-based image retrieval system based on relevance feedback is proposed. The system relies on an interactive search paradigm where at each round a user is presented with k images and selects the one closest to her target. The approach based on hierarchical Gaussian Process bandits is used to trade exploration and exploitation in presenting the images in each round. Experimental results show that the new approach compares favorably with previous work.

Manuscript from author [PDF]

[Back to Top]


Clustering


ES2013-95

Clustering the Vélib’ origin-destinations flows by means of Poisson mixture models

Andry Randriamanamihaga, Etienne Côme, Latifa Oukhellou, Gérard Govaert

Abstract
Studies based on human mobility, including Bycicle Sharing System (BSS) traffic analysis, has expanded over the past few years. They give insight of the underlying urban phenomena linked to city dynamics. This paper presents a generative count-series model using Poisson mixtures to automatically analyse and find temporal-based partitions over the Vélib’ origin-destination (OD) flow-data. Such an approach may provide latent factors that reveal how regions of different usage interact over the time. More generally, the proposed methodology can be used to cluster edges of temporal valued graph with respect to their temporal profiles

Manuscript from author [PDF]

ES2013-89

Delaunay simplices pruning based clustering

Octavio Razafindramanana, Gilles Venturini

Abstract
We introduce in this paper a new clustering method using the Delaunay triangulation of a set of points as an input. The proposed method is based on pruning away extra simplices of a triangulation accord- ing to a local heterogeneity measure which we introduce. This measure provides good clustering results as it yields to better inter-cluster simplices detection. Our introduced measure is evaluated on 2-D shape data set.

Manuscript from author [PDF]

ES2013-69

Hierarchical and multiscale Mean Shift segmentation of population grids

Johanna Baro, Etienne Côme, Patrice Aknin, Olivier Bonin

Abstract
The Mean Shift (MS) algorithm allows to identify clusters that are catchment areas of modes of a probability density function (pdf). We propose to use a multiscale and hierarchical implementation of the algorithm to process grid data of population and identify automatically urban centers and their dependant sub-centers through scales. The multiscale structure is obtained by increasing iteratively the bandwidth of the kernel used to define the pdf on which the MS algorithm works. This will induce a hierarchical structure over clusters since modes will merge together when the bandwidth parameter increases.

Manuscript from author [PDF]

ES2013-87

Bayesian non parametric inference of discrete valued networks

Laetitia Nouedoui, Pierre Latouche

Abstract
We present a non parametric bayesian inference strategy to automatically infer the number of classes during the clustering process of a discrete valued random network. Our methodology is related to the Dirichlet process mixture models and inference is performed using a Blocked Gibbs sampling procedure. Using simulated data, we show that our approach improves over competitive variational inference clustering methods.

Manuscript from author [PDF]

ES2013-20

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

Filippo Pompili, Nicolas Gillis, François Glineur, P.A. Absil

Abstract
Given a nonnegative matrix M, the orthogonal nonnegative matrix factorization (ONMF) problem consists in finding a nonnegative matrix $U$ and an orthogonal nonnegative matrix V such that the product UV is as close as possible to M in the sense of the Frobenius norm. The importance of ONMF comes from its tight connection with data clustering. In this paper, we propose a new ONMF method, called ONP-MF, and we show that it outperforms other clustering methods (including ONMF-based methods) in terms of accuracy on several datasets in text clustering and hyperspectral unmixing.

Manuscript from author [PDF]

ES2013-113

Linear spectral hashing

Zalán Bodó, Lehel Csato

Abstract
Spectral hashing assigns binary hash keys to data points. This is accomplished via thresholding the eigenvectors of the graph Laplacian and obtaining binary codewords. While calculation for inputs in the training set is straightforward, an intriguing and difficult problem is how to compute the hash codewords for unseen data. A second problem we address is the computational difficulties when using the Gaussian similarity measure in spectral hashing: for specific problems -- mainly the processing of large text databases -- we propose linear scalar products as similarity measures and analyze the performance of the algorithm. We implement the linear algorithm and provide an inductive -- generative -- formula that leads to a prediction method similar to locality-sensitive hashing for a new data point. Experiments on document retrieval show promising results.

Manuscript from author [PDF]

ES2013-90

Normalized cuts clustering with prior knowledge and a pre-clustering stage

Diego Peluffo-Ordoñez, Andrés Eduardo Castro-Ospina, Diego Chavez-Chamorro, Carlos Daniel Acosta-Medina, Germán Castellanos-Dominguez

Abstract
Clustering is of interest in cases when data are not labeled enough and a prior training stage is unfeasible. In particular, spectral clustering based on graph partitioning is of interest to solve problems with highly non-linearly separable classes. However, spectral methods, such as the well-known normalized cuts, involve the computation of eigenvectors that is a highly time-consuming task in case of large data. In this work, we propose an alternative to solve the normalized cuts problem for clustering, achieving same results as conventional spectral methods but spending less processing time. Our method consists of a heuristic search to find the best cluster binary indicator matrix, in such a way that each pair of nodes with greater similarity value are first grouped and the remaining nodes are clustered following a heuristic algorithm to search into the similarity-based representation space. The proposed method is tested over a public domain image data set. Results show that our method reaches comparable results with a lower computational cost.

Manuscript from author [PDF]

ES2013-33

Network community detection with edge classifiers trained on LFR graphs

Twan van Laarhoven, Elena Marchiori

Abstract
A popular method for generating graphs with known community structure is the Lancichinetti-Fortunato-Radicchi (LFR) model. This paper investigates the use of LFR graphs as training data for learning classifiers that discriminates between edges that are 'within' a community and 'between' network communities. We trained linear edge-wise weighted support vector machine classifiers on LFR graphs generated with different amounts of mixing between communities. Results of a comparative experimental analysis show that a classifier trained on a graph with more mixing also work well when tested on LFR benchmark graphs generated using less mixing, while it achieves mixed performance on real-life networks, with a tendency towards finding many communities.

Manuscript from author [PDF]

[Back to Top]


Regression and forecasting


ES2013-81

Decoding stimulation intensity from evoked ECoG activity using support vector regression

Armin Walter, Georgios Naros, Martin Spüler, Alireza Gharabaghi, Wolfgang Rosenstiel, Martin Bogdan

Abstract
One of the unsolved problems of the application of cortical stimulation for therapeutic means is the selection of optimal stimulation parameters. Using support vector regression, we demonstrate that the intensity of single pulse electrical stimulation can be decoded from the waveform of the evoked electrocorticographic (ECoG) activity, even if intensities used for training and testing of the regression model are disjoint. This was most effective when stimulation was applied directly over the motor cortex, less so for pre-motor and sensory cortex. Thus, if the optimal shape of the evoked neural response to stimulation is known, a regression model trained on the responses to a small set of stimulation intensities could be sufficient to determine the optimal stimulation intensity.

Manuscript from author [PDF]

ES2013-98

Neurally imprinted stable vector fields

Andre Lemme, Klaus Neumann, Felix Reinhart, Jochen Steil

Abstract
We present a novel learning scheme to imprint stable vector fields into Extreme Learning Machines (ELMs). The networks represent movements, where asymptotic stability is incorporated through constraints derived from a Lyapunov function. We show that our approach successfully performs stable and smooth point-to-point movements learned from human handwriting movements.

Manuscript from author [PDF]

ES2013-79

Ensembles of genetically trained artificial neural networks for survival analysis

Jonas Kalderstam, Patrik Edén, Mattias Ohlsson

Abstract
We have developed a prognostic index model for survival data based on an ensemble of artificial neural networks that optimizes directly on the concordance index. Approximations of the c-index are avoided with the use of a genetic algorithm, which does not require gradient information. The model is compared with Cox proportional hazards (COX) and three support vector machine (SVM) models by Van Belle et al. on two clinical data sets, and only with COX on one artificial data set. Results indicate comparable performance to COX and SVM models on clinical data and superior performance compared to COX on non-linear data.

Manuscript from author [PDF]

ES2013-51

Optimization of Gaussian process hyperparameters using Rprop

Manuel Blum, Martin Riedmiller

Abstract
Gaussian processes are a powerful tool for non-parametric regression. Training can be realized by maximizing the likelihood of the data given the model. We show that Rprop, a fast and accurate gradient-based optimization technique originally designed for neural network learning, can outperform more elaborate unconstrained optimization methods on real world data sets, where it is able to converge more quickly and reliably to the optimal solution.

Manuscript from author [PDF]

ES2013-35

Are Rosenblatt multilayer perceptrons more powerfull than sigmoidal multilayer perceptrons? From a counter example to a general result

Jose Fonseca

Abstract
In the eighties the problem of the lack of an efficient algorithm to train multilayer Rosenblatt perceptrons was solved by sigmoidal neural networks and backpropagation. But should we still try to find an efficient algorithm to train multilayer hardlimit neuronal networks, a task known as a NP-Complete problem? In this work we show that this would not be a waste of time by means of a counter example where a two layer Rosenblatt perceptron with 21 neurons showed much more computational power than a sigmoidal feedforward two layer neural network with 300 neurons trained by backpropagation for the same classification problem. We show why the synthesis of logical functions with threshold gates or hardlimit perceptrons is an active research area in VLSI design and nanotechnology and we review some of the methods to synthesize logical functions with a multilayer hardlimit perceptron and we propose the search for an efficient method to synthesize any classification problem with analogical inputs with a two layer hardlimit perceptron as a near future objective. Nevertheless we recognize that with hardlimit multilayer perceptrons we cannot approximate continuous functions as we can easily do with multilayer sigmoidal neural networks, with multilayer hardlimit perceptrons we can only solve any classification problem, as we plan to demonstrate in a near future.

Manuscript from author [PDF]

ES2013-92

Detection and quantification in real-time polymerase chain reaction

Abou KEITA, Romain HERAULT, Colas CALBRIX, Stéphane Canu

Abstract
The estimation of the concentration of an infectious agent in the environment is a key step to trigger an alert when there is a biological threat. This concentration can be obtained trough a quantitative polymerase chain reaction (qPCR). Nevertheless, standard real-time procedure do not address detection delay which is a main concern in alert triggering. Therefore, we propose a method based on Lasso regression and CUSUM change detection to accurately estimate the concentration while minimizing the detection delay. We compare our results with those found by a standard method (threshold method) and promising results are obtained.

Manuscript from author [PDF]

ES2013-43

Temperature Forecast in Buildings Using Machine Learning Techniques

Fernando Mateo, Juan J. Carrasco, Mónica Millán-Giraldo, Abderrahim Sellami, Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas

Abstract
Energy efficiency in buildings requires having good prediction of the variables that define the power consumption in the building. Temperature is the most relevant of these variables because it affects the operation of the cooling systems in summer and the heating systems in winter, while being also the main variable that defines comfort. This paper presents the application of classical methods of time series forecasting, such as Autoregressive (AR), Multiple Linear Regression (MLR) and Robust MLR (RMLR) models, along with others derived from more complex machine learning techniques, including Multilayer Perceptron with Non-linear Autoregressive Exogenous (MLP-NARX) and Extreme Learning Machine (ELM), to forecast temperature in buildings. The results obtained in the temperature prediction of several rooms of a building show the goodness of machine learning methods as compared to traditional approaches.

Manuscript from author [PDF]

ES2013-121

Forecasting Financial Markets with Classified Tactical Signals

Patrick Kouontchou, Amaury Lendasse, Yoan Miché, Bertrand Maillet

Abstract
The financial market dynamics can be characterized by macro-economic, micro-financial and market risk indicators, used as lead- ing indicators by market professionals. In this article, we propose a method to identify market states integrating two classification algorithms: a Robust Kohonen Self-Organising Maps one and a CART one. After studying the market’s states separation using the former, we use the latter to characterize the economic conditions over time and to compute the conditional probabilities of related market states.

Manuscript from author [PDF]

[Back to Top]


Developments in kernel design


ES2013-10

Developments in kernel design

Lluís Belanche

Abstract

Manuscript from author [PDF]

ES2013-29

A quotient basis kernel for the prediction of mortality in severe sepsis patients

Vicent Ribas Ripoll, Enrique Romero, Juan Carlos Ruiz-Rodríguez, Alfredo Vellido

Abstract
In this paper, we describe a novel kernel for multinomial distributions, namely the Quotient Basis Kernel (QBK), which is based on a suitable reparametrization of the input space through algebraic geometry and statistics. The QBK is used here for data transformation prior to classification in a medical problem concerning the prediction of mortality in patients suffering severe sepsis. This is a common clinical syndrome, often treated at the Intensive Care Unit (ICU) in a time-critical context. Mortality prediction results with Support Vector Machines using QBK compare favorably with those obtained using alternative kernels and standard clinical procedures.

Manuscript from author [PDF]

ES2013-103

Synthetic over-sampling in the empirical feature space

María Pérez-Ortiz, Pedro A. Gutiérrez, César Hervás-Martínez

Abstract
The imbalanced nature of some real-world data is one of the current challenges for machine learning, giving rise to different approaches to handling it. However, preprocessing methods operate in the original input space, presenting distortions when combined with the kernel classifiers, which make use of the feature space. This paper explores the notion of empirical feature space (a Euclidean space which is isomorphic to the feature space) to develop a kernel-based synthetic over-sampling technique, which maintains the main properties of the kernel mapping. The proposal achieves better results than the same oversampling method applied to the original input space.

Manuscript from author [PDF]

ES2013-21

Multi-scale Support Vector Machine Optimization by Kernel Target-Alignment

María Pérez-Ortiz, Pedro A. Gutiérrez, Javier Sánchez-Monedero, César Hervás-Martínez

Abstract
The problem considered is the optimization of a multi-scale kernel, where a different width is chosen for each feature. This idea has been barely studied in the literature, and through the use of evolutionary or gradient descent approaches, which explicitly train the learning machine and thereby incur high computacional cost. To cope with this limitation, the problem is explored by making use of an analytical methodology known as kernel-target alignment, where the kernel is optimized by aligning it to the so-called ideal kernel matrix. The results show that the proposal leads to better performance and simpler models at limited computational cost when applying the binary Support Vector Machine (SVM) paradigm.

Manuscript from author [PDF]

ES2013-105

Handling missing values in kernel methods with application to microbiology data

Vladimer Kobayashi, Tomas Aluja, Lluís Belanche

Abstract
We discuss several approaches that make possible for kernel methods to deal with missing values. The first two are extended kernels able to handle missing values without data preprocessing methods. Another two methods are derived from a sophisticated multiple imputation technique involving logistic regression as local model learner. The performance of these approaches is compared using a binary data set that arises typically in microbiology (the microbial source tracking problem). Our results show that the kernel extensions demonstrate competitive performance in comparison with multiple imputation in terms of predictive accuracy. However, these results are achieved with a simpler and deterministic methodology and entail a much lower computational effort.

Manuscript from author [PDF]

[Back to Top]


Human Activity and Motion Disorder Recognition: towards smarter Interactive Cognitive Environments


ES2013-11

Human Activity and Motion Disorder Recognition: towards smarter Interactive Cognitive Environments

Jorge Luis Reyes-Ortiz, Alessandro Ghio, Xavier Parra, Davide Anguita, Joan Cabestany, Andreu Català

Abstract
The rise of ubiquitous computing systems in our environment is engendering a strong need of novel approaches of human-computer interaction. Either for extending the existing range of possibilities and services available to people or for providing assistance the ones with limited conditions. Human Activity Recognition (HAR) is playing a central role in this task by offering the input for the development of more interactive and cognitive environments. This has motivated the organization of the ESANN 2013 Special Session in Human Activity and Motion Disorder Recognition and the execution of a competition in HAR. Here, a compilation of the most recent proposals in the area are exposed accompanied by the results of the contest calling for innovative approaches to recognize activities of daily living (ADL) from a recently published data set.

Manuscript from author [PDF]

ES2013-57

A heterogeneous database for movement knowledge extraction in Parkinson’s disease

Albert Samà, Carlos Pérez-López, Daniel Rodríguez-Martín, Joan Cabestany, Juan Manuel Moreno-Arostegui, Alejandro Rodríguez-Molinero

Abstract
This paper presents the design and methodology used to create a heterogeneous database for knowledge movement extraction in Parkinson's Disease. This database is being constructed as part of REMPARK project and is composed of movement measurements acquired from inertial sensors, standard medical scales as Unifi ed Parkinson's Disease Rating Scale, and other information obtained from 90 Parkinson's Disease patients. The signals obtained will be used to create movement disorder detection algorithms using supervised learning techniques. The diff erent sources of information and the need of labelled data pose many challenges which the methodology described in this paper addresses. Some preliminary data obtained are presented.

Manuscript from author [PDF]

ES2013-71

Long term analysis of daily activities in smart home

Labiba Gillani Fahad, Arshad Ali, Muttukrishnan Rajarajan

Abstract
In this paper, we propose the approach to monitor a change in the daily routine of a person using the long term analysis of the activities performed in a smart home. The proposed approach comprises of two steps; first is the activity recognition, in which the newly detected activity instances are labeled using the learning model probabilistic neural network. In the second step, the daily routine of the occupant in the smart home is analyzed by exploiting the group of activities of a day performed over a period of time. We apply K-means clustering to separate the normal routine to unusual and suspected routines. The proposed approach is validated on a publicly available dataset.

Manuscript from author [PDF]

ES2013-76

Sensor Positioning for Activity Recognition Using Multiple Accelerometer-Based Sensors

Lei Gao, Alan Bourke, John Nelson

Abstract
Physical activity has a positive impact on people’s well-being and it can decrease the occurrence of chronic disease. To date, there has been a substantial amount of research studies, which focus on activity recognition using accelerometer and gyroscope-based sensors. However, the sensor position and the sensor combination, which have the best recognition performance with minimum sensor number, have not been investigated enough. This study proposes a method to adopt multiple accelerometer-based sensors on different body locations to investigate this problem. The dataset was collected in a study conducted by the eCAALYX project. Eight subjects were recruited to perform eight normal scripted activities in different life scenarios, and each repeated three times. Thus a total of 192 activities were recorded. The collected dataset was used to find the most suitable sensor-subset for recognizing Activities of Daily Living (ADLs).

Manuscript from author [PDF]

ES2013-88

Multi-user Blood Alcohol Content estimation in a realistic simulator using Artificial Neural Networks and Support Vector Machines

Audrey Robinel, Didier Puzenat

Abstract
We instrumented a realistic car simulator to extract low level data related to the driver's use of the vehicle controls. After proceeding these data, we generated features that were fed to a Multi-Layer Perceptron (MLP) and Support Vector Machines (SVM) in order to determine weather the driver was over a blood alcohol content threshold, and even estimate the BAC value. We discuss the results of the prototype using the MLP and SVM (or SVR) algorithms in both single-user and multi-user context.

Manuscript from author [PDF]

ES2013-84

A Public Domain Dataset for Human Activity Recognition using Smartphones

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz

Abstract
Human-centered computing is an emerging research field that aims to understand human behavior and integrate users and their social context with computer systems. One of the most recent, challenging and appealing applications in this framework consists in sensing human body motion using smartphones to gather context information about people actions. In this context, we describe in this work an Activity Recognition database, built from the recordings of 30 subjects doing Activities of Daily Living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors, which is released to public domain on a well-known on-line repository. Results, obtained on the dataset by exploiting a multiclass Support Vector Machine (SVM), are also acknowledged.

Manuscript from author [PDF]

ES2013-124

A One-Vs-One Classifier Ensemble With Majority Voting for Activity Recognition

Bernardino Romera-Paredes, M. S. H. Aung, Nadia Bianchi-Berthouze

Abstract
A solution for the automated recognition of six full body motion activities is proposed. This problem is posed by the release of the Activity Recognition database and forms the basis for a classification competition at the European Symposium on Artificial Neural Networks 2013. The data-set consists of motion characteristics of thirty subjects captured using a single device delivering accelerometric and gyroscopic data. Included in the released data-set are 561 processed features in both the time and frequency domains. The proposed recognition framework consists of an ensemble of linear support vector machines each trained to discriminate a single motion activity against another single activity. A majority voting rule is used to determine the final outcome. For comparison, a six "winner take all" multiclass support vector machine ensemble and k- Nearest Neighbour models were also implemented. Results show that the system accuracy for the one versus one ensemble is 96.4% for the competition test set. Similarly, the multiclass SVM ensemble and k-Nearest Neighbour returned accuracies of 93.7% and 90.6% respectively. The outcomes of the one versus one method were submitted to the competition resulting in the winning solution.

Manuscript from author [PDF]

ES2013-123

A sparse kernelized matrix learning vector quantization model for human activity recognition

Marika Kästner, Marc Strickert, Thomas Villmann

Abstract
The contribution describes the application of the 'Computational Intelligence Group' from the University of Applied Sciences Mittweida (Germany) to the ESANN'2013 Competition on 'Human Activity Recognition (HAR)' using Android-OS smartphone sensor signals. We applied a kernel variant of learning vector quantization with metric adaptation with only one prototype vector per class (sparse model). This model obtains very good accuracies and additionally provides class correlation information. Further, the model allows an optimized class visualization.

Manuscript from author [PDF]

ES2013-122

A competitive approach for human activity recognition on smartphones

Attila Reiss, Gustaf Hendeby, Didier Stricker

Abstract
This paper describes a competitive approach developed for an activity recognition challenge. The competition was defined on a new and publicly available dataset of human activities, recorded with smartphone sensors. This work investigates different feature sets for the activity recognition task of the competition. Moreover, the focus is also on the introduction of a new, confidence-based boosting algorithm called ConfAdaBoost.M1. Results show that the new classification method outperforms commonly used classifiers, uch as decision trees or AdaBoost.M1.

Manuscript from author [PDF]

[Back to Top]


Classification


ES2013-104

A dictionary learning based method for aCGH segmentation

Salvatore Masecchia, Saverio Salzo, Annalisa Barla, Alessandro Verri

Abstract
The starting point of our work is to devise a model for segmentation of aCGH data. We propose an optimization method based on dictionary learning and regularization and we compare it with a state-of-the-art approach, presenting our experimental results on synthetic data.

Manuscript from author [PDF]

ES2013-75

A Learning Machine with a Bit-Based Hypothesis Space

Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella

Abstract
We propose in this paper a bit-based classifier, picked from an hypothesis space described accordingly to sparsity and locality principles: the complexity of the corresponding space of functions is controlled through the number of bits needed to represent it, so that it will include the classifiers that will be most likely chosen by the learning procedure. Through an introductory example, we show how the number of bits, the sparsity of the representation and the local definition approach affect the complexity of the space of functions, where the final classifier is selected from.

Manuscript from author [PDF]

ES2013-65

Optimization by Variational Bounding

Joe Staines, David Barber

Abstract
We discuss a general technique that forms a differentiable bound on non-differentiable objective functions by bounding the function optimum by its expectation with respect to a parametric variational distribution. We describe sufficient conditions for the bound to be convex with respect to the variational parameters. As example applications we consider variants of sparse linear regression and SVM training.

Manuscript from author [PDF]

ES2013-118

support vector machine-based aproach for multi-labelers problems

Santiago Murillo Rendón, Diego Peluffo-Ordoñez, Germán Castellanos-Dominguez

Abstract
We propose a first approach to quantify the panelist's labeling generalizing a soft-margin support vector machine classifier to multi-label analysis. Such variation consist of formulating the optimization problem within a quadratic programming framework instead of using a heuristic search algorithm. Our method's outcomes are penalty or relevance values associated with each panelist, pointing out a well performing labeler when lower is its value. For experiments, two databases are considered. Firstly, the well-known Iris with multiple artificial labels. Secondly, a multi-label speech database for detecting hypernasality. Obtained penalty factors are compared with both standard supervised and non-supervised measurements. The results are promising to asses the concordance among panelists taking into account the structure of data.

Manuscript from author [PDF]

ES2013-115

Read classification for next generation sequencing

James Hogan, Peter Holland, Alex Holloway, Robert Petit, Timothy Read

Abstract
Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine, and a method of considerable diagnostic value. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, it is necessary to ensure that the collected reads are consistent with the species or species group assumed, and not corrupted in some way. The bacterium Staphylococcus aureus is a common infectious agent in hospitals, causing severe and potentially life-threatening infections, with some strains exhibiting antibiotic resistance. In this paper, we apply a Support Vector Machine classifier to the important problem of distinguishing S. aureus sequencing projects from a range of alternatives, including other pathogens and closely related Staphylococci. Using a representation based on sequence k-mers of various lengths, we are able to make the correct prediction in over 95% of cases, while reporting almost no false positives, and implicating features with important functional associations in the bacterium.

Manuscript from author [PDF]

ES2013-107

A new metric for dissimilarity data classification based on Support Vector Machines optimization

Agata Manolova, Anne Guerin-Dugue

Abstract
Dissimilarities are extremely useful in many real-world pattern classification problems, where the data resides in a complicated, complex space, and it can be very difficult, if not impossible, to find useful feature vector representations. In these cases a dissimilarity representation may be easier to come by. The goal of this work is to provide a new technique based on Support Vector Machines (SVM) optimization that can be a good alternative in terms of accuracy compared to known methods using dissimilarities such as k nearest neighbor classifier (kNN), prototype-based dissimilarity classifiers and distance kernel based SVM classifiers.

Manuscript from author [PDF]

ES2013-61

DYNG: Dynamic Online Growing Neural Gas for stream data classification

Oliver Beyer, Philipp Cimiano

Abstract
In this paper we introduce Dynamic Online Growing Neural Gas (DYNG), a novel online stream data classification approach based on Online Growing Neural Gas (OGNG). DYNG exploits labelled data during processing to adapt the network structure as well as the speed of growth of the network to the requirements of the classification task. It thus speeds up learning for new classes/labels and dampens growth of the subnetwork representing the class once the class error converges. We show that this strategy is beneficial in life-long learning settings involving non-stationary data, giving DYNG an increased performance in highly non-stationary phases compared to OGNG.

Manuscript from author [PDF]

ES2013-15

Prior knowledge in an end-user trainable machine vision framework

Klaas Dijkstra, Walter Jansen, Jaap van de Loosdrecht

Abstract
The increasing popularity of machine vision based solutions in common applications calls for a structured approach for incorporating the end user's domain knowledge and limiting the solution's dependency on expert knowledge. We propose a framework facilitating optimized classification results and will show several approaches in which prior knowledge of the solution is captured in a neural network or in a geometric pattern matcher. The methodology is applied to disc print reading for antibiotic susceptibility testing by disc diffusion. Results show that increased prior knowledge produces better classifiers, and that more thorough optimization is required to increase the accuracy of classifiers which use less prior knowledge.

Manuscript from author [PDF]

ES2013-59

Border sensitive fuzzy vector quantization in semi-supervised learning

Tina Geweniger, Marika Kästner, Thomas Villmann

Abstract
We propose a semi-supervised fuzzy vector quantization method for the classification of incompletely labeled data. Since information contained within the structure of the data set should not be neglected, our method considers the whole data set during the learning process. In difference to known methods our approach uses neighborhood cooperativeness for stable prototype learning known from Neural Gas. Further improvement of the classification accuracy is achieved by including class border sensitivity inspired by Support Vector Machines again improved by neighborhood learning.

Manuscript from author [PDF]

ES2013-22

B-bleaching: Agile Overtraining Avoidance in the WiSARD Weightless Neural Classifier

Danilo Carvalho, Hugo Carneiro, Felipe França, Priscila Lima

Abstract
Weightless neural networks constitute a still not fully explored Machine Learning paradigm, even if its first model, WiSARD, is considered. Bleaching, an improvement on WiSARD's learning mechanism was recently proposed in order to avoid overtraining. Although presenting very good results in different application domains, the original sequential bleaching and its confidence modulation mechanisms still offer room for improvement. This paper presents a new variation of the bleaching mechanism and compares the three strategies performance on a complex domain, that of multilingual grammatical categorization. Experiments considered both number of iterations and accuracy. Results show that binary bleaching allows for a considerable improvement to number of iterations whilst not introducing loss of accuracy.

Manuscript from author [PDF]

ES2013-101

WIPS: the WiSARD Indoor Positioning System

D.O. Cardoso, J. Gama, Massimo De Gregorio, Felipe França, Maurizio Giordano, Priscila Lima

Abstract
In this paper, we present a WiSARD-based system facing the problem of Indoor Positioning (IP) by taking advantage of pervasively available infrastructures (WiFi Access Points – AP). The goal is to develop a system to be used to position users in indoor environments, such as: museums, malls, factories, offshore platforms etc. Based on the fingerprint approach, we show how the proposed weightless neural system provides very good results in terms of performance and positioning resolution. Both the approach to the problem and the system will be presented through two correlated experiments.

Manuscript from author [PDF]

ES2013-72

Cost-sensitive cascade graph neural networks

Van Tuc Nguyen, Ah Chung Tsoi, Markus Hagenbuchner

Abstract
This paper introduces a novel cost sensitive approach to a cascade of Graph Neural Networks for learning from unbalanced data in the graph structured domain. The proposed method is shown to be very effective in addressing the un- desirable effects of unbalanced data distribution on learning systems. The proposed idea is based on a weighting mechanism which forces the network to encode mis- classified graphs (or nodes) more strongly. The idea is applied to Graph Neural Networks which are capable of encoding complex graph structured data. We evalu- ate the model through an application to a well known Web spam detection problem, and demonstrate that the general network performance is improved as a result.

Manuscript from author [PDF]

[Back to Top]


Sparsity for interpretation and visualization in inference models


ES2013-14

Research directions in interpretable machine learning models

Vanya Van Belle, Paulo Lisboa

Abstract

Manuscript from author [PDF]

ES2013-3

Learning regression models with guaranteed error bounds

Clemens Otte

Abstract
The combination of a symbolic regression model with a residual Gaussian Process is proposed for providing an interpretable model with improved accuracy. While the learned symbolic model is highly interpretable the residual model usually is not. However, by limiting the output of the residual model to a defined range a worst-case guarantee can be given in the sense that the maximal deviation from the symbolic model is always below a defined limit. When ranking the accuracy and interpretability of several different approaches on the SARCOS data benchmark the proposed combination yields the best result.

Manuscript from author [PDF]

ES2013-63

Sparse approximations for kernel learning vector quantization

Daniela Hofmann, Barbara Hammer

Abstract
Various prototype based learning techniques have recently been extended to similarity data by means of kernelization. While state-of-the-art classification results can be achieved this way, kernelization loses one important property of prototype-based techniques: a representation of the solution in terms of few characteristic prototypes which can directly be inspected by experts. In this contribution, we introduce several different ways to obtain sparse representations for kernel learning vector quantization and compare its efficiency and performance in connection to the underlying data characteristics in diverse benchmark scenarios.

Manuscript from author [PDF]

ES2013-36

Robust cartogram visualization of outliers in manifold learning

Alessandra Tosi, Alfredo Vellido

Abstract
Most real data sets contain atypical observations, often referred to as outliers. Their presence may have a negative impact in data modeling using machine learning. This is particularly the case in data density estimation approaches. Manifold learning techniques provide low-dimensional data representations, often oriented towards visualization. The visualization provided by density estimation manifold learning methods can be compromised by the presence of outliers. Recently, a cartogram-based representation of model-generated distortion was presented for nonlinear dimensionality reduction. Here, we investigate the impact of outliers on this visualization when using manifold learning techniques that behave robustly in their presence.

Manuscript from author [PDF]

ES2013-44

ManiSonS: A New Visualization Tool for Manifold Clustering

José M. Martínez-Martínez, Pablo Escandell-Montero, José D. Martín-Guerrero, Joan Vila-Francés, Emilio Soria-Olivas

Abstract
Manifold learning is an important theme in machine learning. This paper proposes a new visualization approach to manifold clustering. The method is based on pie charts in order to obtain meaningful visualizations of the clustering results when applying a manifold technique. In addition to this, the proposed approach extracts all the existing relationships among the attributes of the different clusters and find the most important variables of the manifold in order to distinguish among the different clusters. The methodology is tested in one synthetic data set and one real data set. Achieved results show the suitability and usefulness of the proposed approach.

Manuscript from author [PDF]

ES2013-83

Visualizing pay-per-view television customers churn using cartograms and flow maps

David L. García, Angela Nebot, Alfredo Vellido

Abstract
Media companies aggressively compete for their share of the pay-per-view television market. Such share can only be kept or improved by avoiding customer defection, or churn. The analysis of customers' data should provide insight into customers' behavior over time and help preventing churn. Data visualization can be part of this analysis. Here, a database of pay-per-view television customers is visualized using a nonlinear manifold learning model. This visualization is enhanced through, first, the reintroduction of the local nonlinear distortion using a cartogram technique and, second, the visualization of customer migrations using flow maps. Both techniques are inspired by geographical representation.

Manuscript from author [PDF]

ES2013-86

Visualizing dependencies of spectral features using mutual information

Andrej Gisbrecht, Yoan Miché, Barbara Hammer, Amaury Lendasse

Abstract
The curse of dimensionality leads to problems in machine learning when dealing with high dimensionality. This aspect is particularly pronounced if intrinsically infinite dimensionality is faced such as present for spectral or functional data. Feature selection constitutes one possibility to deal with this problem. Often, it relies on mutual information as an evaluation tool for the feature importance, however, it might be overlaid by intrinsic biases such as a high correlation of neighbored function values for functional data. In this paper we propose to asses feature correlations of spectral data by an overlay of prior dependencies due to the functional nature and its similarity as measured by mutual information, enabling a quick overall assessment of the relationships between features. By integrating the Nyström approximation technique, the usually time consuming step to compute all pairwise mutual informations can be reduced to only linear complexity in the number of features.

Manuscript from author [PDF]

[Back to Top]