ESANN2017

25th European Symposium on Artificial Neural Networks
Bruges, Belgium, April 26-27-28

[Electronic proceedings home page] [Electronic proceedings author index]

ESANN2017
Content of the proceedings

WARNING: you need Adobe Acrobat reader 7.0 or more to view the PDF files below



Deep and kernel methods: best of two worlds


ES2017-5

Bridging deep and kernel methods

Lluís Belanche, Marta Costa-jussa

Abstract

Manuscript from author [PDF]

ES2017-108

Structure optimization for deep multimodal fusion networks using graph-induced kernels

Dhanesh Ramachandram, Michal Lisicki, Timothy J. Shields, Mohamed R. Amer, Graham W. Taylor

Abstract
A popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs such as video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at different levels of nonlinear feature extraction. However, designing an optimal architecture in which to fuse such learned represen- tations has largely been a non-trivial human engineering effort. We treat fusion structure optimization as a hyper-parameter search and cast it as a discrete optimization problem under the Bayesian optimization frame- work. We propose a novel graph-induced kernel to compute structural similarities in the search space of tree-structured multimodal architectures and demonstrate its effectiveness using two challenging multimodal human activity recognition datasets.

Manuscript from author [PDF]

ES2017-97

Scalable Hybrid Deep Neural Kernel Networks

Siamak Mehrkanoon, Andreas Zell, Johan A. K. Suykens

Abstract
This paper introduces a novel hybrid deep neural kernel framework. The proposed deep learning model follows a combination of neural networks based architecture and a kernel based model. In particular, here an explicit feature map, based on random Fourier features, is used to make the transition between the two architectures more straightforward as well as making the model scalable to large datasets by solving the optimization problem in the primal. The introduced framework can be considered as the first building block for the development of even deeper models and more advanced architectures. Experimental results show a significant improvement over shallow models on several medium to large scale real-life datasets.

Manuscript from author [PDF]

ES2017-120

Learning dot-product polynomials for multiclass problems

Ivano Lauriola, Michele Donini, Fabio Aiolli

Abstract
Several mechanisms exist in the literature to solve a multiclass classification problem exploiting a binary kernel-machine. Most of them are based on problem decomposition that consists on splitting the problem in many binary tasks. These tasks have different complexity and they require different kernels. Our goal is to use the Multiple Kernel Learning (MKL) paradigm to learn the best dot-product kernel for each decomposed binary task. In this context, we propose an efficient learning procedure to reduce the searching space of hyperparameters, showing its empirically effectiveness.

Manuscript from author [PDF]

ES2017-63

Support vector components analysis

Michiel van der Ree, Jos Roerdink, Christophe Phillips, Gaëtan Garraux, Eric Salmon, Marco Wiering

Abstract
In this paper we propose a novel method for learning a distance metric in the process of training Support Vector Machines (SVMs) with various kernels. A transformation matrix is adapted in such a way that the SVM dual objective of a classification problem is optimized. By using a wide transformation matrix the method can effectively be used as a means of supervised dimensionality reduction. We compare our method with other algorithms on a toy dataset and on PET-scans of patients with various Parkinsonisms, finding that our method either outperforms or performs on par with the other algorithms.

Manuscript from author [PDF]

ES2017-37

Algebraic multigrid support vector machines

Ehsan Sadrfaridpour, Sandeep Jeereddy, Ken Kennedy, Andre Luckow, Talayeh Razzaghi, Ilya Safro

Abstract
The support vector machine is a flexible optimization-based technique widely used for classification problems. In practice, its training part becomes computationally expensive on large-scale data sets because of such reasons as the complexity and number of iterations in the parameter fitting methods, underlying optimization solvers, and nonlinearity of kernels. We introduce a fast multilevel framework for solving support vector machine models that is inspired by the algebraic multigrid. Significant improvement in the running has been achieved without any loss in the quality. The proposed technique is highly beneficial on imbalanced sets. We demonstrate computational results on publicly available and industrial data sets.

Manuscript from author [PDF]

ES2017-135

Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks

Stephan Baier, Sigurd Spieckermann, Volker Tresp

Abstract
With the rising number of interconnected devices and sensors, modeling distributed sensor networks is of increasing interest. Recurrent neural networks (RNN) are considered particularly well suited for modeling sensory and streaming data. When predicting future behavior, incorporating information from neighboring sensor stations is often beneficial. We propose a new RNN based architecture for context specific information fusion across multiple spatially distributed sensor stations. Hereby, latent representations of multiple local models, each modeling one sensor station, are jointed and weighted, according to their importance for the prediction. The particular importance is assessed depending on the current context using a separate attention function. We demonstrate the effectiveness of our model on three different real-world sensor network datasets.

Manuscript from author [PDF]

ES2017-96

Fusion of Stereo Vision for Pedestrian Recognition using Convolutional Neural Networks

Danut Ovidiu Pop, Alexandrina Rogozan, Fawzi Nashashibi, Abdelaziz Bensrhair

Abstract
Pedestrian detection is a highly debated issue in scientific world due to its outstanding importance for a large number of applications, especially in the fields of automotive safety, robotics and surveillance. In spite of the widely varying methods developed in recent years, pedestrian detection is still an open challenge whose accuracy and robustness has to be improved. Therefore, in this paper, we focus on the improvement of the classification component in the pedestrian detection task on the Daimler stereo vision data set by adopting two approaches: 1) by combining three image modalities (intensity, depth and flow) to feed a unique convolutional neural network (CNN) and 2) by fusing the results of three independent CNNs.

Manuscript from author [PDF]

ES2017-50

Training convolutional networks with weight–wise adaptive learning rates

Alan Mosca, George Magoulas

Abstract
Current state–of–the–art Deep Learning classification with Convolutional Neural Networks achieves very impressive results, which are, in some cases, close to human level performance. However, training these methods to their optimal performance requires very long training periods, usually by applying the Stochastic Gradient Descent method. We show that by applying more modern methods, which involve adapting a different learning rate for each weight rather than using a single, global, learning rate for the entire network, we are able to reach close to state–of–the–art performance on the same architectures, and improve the training time and accuracy.

Manuscript from author [PDF]

ES2017-130

Invariant representations of images for better learning

Muthuvel Murugan Issakkimuthu, Subrahmanyam K. V.

Abstract
We study the problem of obtaining representations of images which are invariant to transformation of the image under rotations, towards improving supervised learning. We show that using simple ideas from group representation theory we get invariant representations of images. Off the shelf learning algorithms perform much better on such representations.

Manuscript from author [PDF]

ES2017-112

Feature Extraction for On-Road Vehicle Detection Based on Support Vector Machine

Samuel Giatti Silva Filho, Roberto Freire, Leandro dos Santos Coelho

Abstract
Inspired by alarming statistics of deaths and injuries in car accidents, this work presents the development of vehicles detection method, which is part of an Advanced Driving Assistance System. A computer vision software capable to interpret real-time events on roads, that can identify vehicles based on Support Vector Machine, was presented and evaluated by adopting two distinct techniques for features extraction. Comparisons between two feature extraction techniques (Invariant Features Transform and Histogram of Oriented Gradients) were presented, and promising results in terms of vehicles identification accuracy were obtained when a frame scan technique was integrated to the system.

Manuscript from author [PDF]

ES2017-38

Predicting Time Series with Space-Time Convolutional and Recurrent Neural Networks

Wolfgang Groß, Sascha Lange, Joschka Bödecker, Manuel Blum

Abstract
We present a novel approach to predict time series with a deep recurrent and convolutional neural network. In order to apply modern deep learning techniques to financial time series, deep neural networks have to learn problem-specific, spatio-temporal features. In computer vi- sion, convolutional neural networks with their ability to learn useful spatial features have given rise to groundbreaking results, but spatio-temporal patterns—as they arise in multivariate financial time series—pose additional challenges. We demonstrate that the features the model learns are better than hand-crafted features of a professional trader. We also show that our model beats other models at predicting the price development on the European Power Exchange (EPEX).

Manuscript from author [PDF]

[Back to Top]


Randomized Machine Learning approaches: analysis and developments


ES2017-4

Randomized Machine Learning Approaches: Recent Developments and Challenges

Claudio Gallicchio, José D. Martín-Guerrero, Alessio Micheli, Emilio Soria-Olivas

Abstract

Manuscript from author [PDF]

ES2017-69

Fisher memory of linear Wigner echo state networks

Peter Tino

Abstract
We study asymptotic properties of Fisher memory of linear Echo State Networks with randomized reservoir coupling prescribed by the class of Wigner matrices. Three properties of Fisher memory normalized per state space dimension are derived: (1) as the system size grows, the contribution of self-coupling of self-loops on reservoir units to the Fisher memory is negligible; (2) the maximal Fisher memory is achieved when the input-to-state coupling is collinear with the dominant eigenvector of the state space coupling matrix; and (3) when the input-to-state coupling is collinear with the sum of eigenvectors of the state space coupling, the expected normalized memory is four time smaller than the maximal memory value.

Manuscript from author [PDF]

ES2017-53

Generalization Performances of Randomized Classifiers and Algorithms built on Data Dependent Distributions

Luca Oneto, Sandro Ridella, Davide Anguita

Abstract
In this paper we prove that a randomized algorithm based on the data generating dependent prior and data dependent posterior Boltzmann distributions of Catoni (2007) is Differentially Private (DP) and shows better generalization properties than the Gibbs (randomized) classifier associated to the same distributions. For this purpose, we will develop a sharper DP-based generalization bounds, which improve over the current state-of-the-art Hoeffding-type bound.

Manuscript from author [PDF]

ES2017-27

ELM Preference Learning for Physiological Data

Davide Bacciu, Michele Colombo, Davide Morelli, David Plans

Abstract
The work confronts two approaches to realize preference learning using Extreme Learning Machine networks, relaying on limited and subject-dependant information concerning pairwise relations between data samples. We describe an application within the context of assessing the effect of breathing exercises on heart-rate variability, using a dataset of over $19$K exercising sessions. Results highlight the importance of using weight sharing architectures to learn smooth e generalizable complete orders induced by the preference relation.

Manuscript from author [PDF]

ES2017-92

Advanced query strategies for Active Learning with Extreme Learning Machines

Anton Akusok, Emil Eirola, Yoan Miché, Andrey Gritsenko, Amaury Lendasse

Abstract
This work addresses an important part of solving applied problems that is data acquisition. Often raw data is cheap while labeling is an expensive manual job. Active Learning reduces the labeling effort by suggesting particular samples with a query strategy. The paper proposes three new query strategies built on recent developments in extreme learning machines: based a committee of class-weighted ELM, based on prediction intervals found with ELM, and based on mislabeled samples detection with ELM. The proposed strategies perform on the state-of-the-art level on three real world datasets.

Manuscript from author [PDF]

ES2017-43

Random projection initialization for deep neural networks

Piotr Iwo Wójcik, Marcin Kurdziel

Abstract
In this work we propose to initialize rectifier neural networks with random projection matrices. We focus on Convolutional Neural Networks and fully-connected networks with pretraining. Our results show, that in convolutional networks a well designed random projection initialization can perform better than the current state-of-the-art He's initialization. Specifically, in our evaluation, initialization based on the Subsampled Randomized Hadamard Transform consistently outperformed He's initialization on several evaluated image classification datasets.

Manuscript from author [PDF]

[Back to Top]


Classification


ES2017-34

Fine-grained event learning of human-object interaction with LSTM-CRF

Tuan Do, James Pustejovsky

Abstract
Event learning is one of the most important problems in AI. However, notwithstanding significant research efforts, it is still a very complex task, especially when the events involve the interaction of humans or agents with other objects, as it requires modeling human kinematics and object movements. This study proposes a methodology for learning complex human-object interaction (HOI) events, involving the recording, annotation and classification of event interactions. For annotation, we allow multiple interpretations of a motion capture by slicing over its temporal span; for classification, we use Long-Short Term Memory (LSTM) sequence models with Conditional Randon Field (CRF) for constraints of outputs. Using a setup involving captures of human-object interaction as three dimensional inputs, we argue that this approach could be used for event types involving complex spatio-temporal dynamics.

Manuscript from author [PDF]

ES2017-8

Distance metric learning: a two-phase approach

Bac Nguyen, Carlos Morell, Bernard De Baets

Abstract
Distance metric learning has been successfully incorporated in many machine learning applications. The main challenge arises from the positive semidefiniteness constraint on the Mahalanobis matrix, which results in a high computational cost. In this paper, we develop a novel approach to reduce this computational burden. We first map each training example into a new space by an orthonormal transformation. Then, in the transformed space, we simply learn a diagonal matrix. This two-phase approach is thus much easier and less costly than learning a full Mahalanobis matrix in one phase as is commonly done.

Manuscript from author [PDF]

ES2017-57

An EM transfer learning algorithm with applications in bionic hand prostheses

Benjamin Paassen, Alexander Schulz, Janne Hahne, Barbara Hammer

Abstract
Modern bionic hand prostheses feature unprecedented functionality, permitting simultaneous motion in multiple degrees of freedom. An intuitive user interface based on muscle signals requires machine learning models. However, current models are not yet sufficiently robust to everyday disturbances, such as electrode shifts. We propose a novel expectation maximization approach for transfer learning to rapidly recalibrate a machine learning model if disturbances occur. In our experimental evaluation we show that even under conditions of incomplete class coverage and few data points our approach finds a viable transfer mapping which improves classification accuracy significantly.

Manuscript from author [PDF]

ES2017-54

Dropout Prediction at University of Genoa: a Privacy Preserving Data Driven Approach

Luca Oneto, Anna Siri, Gianvittorio Luria, Davide Anguita

Abstract
Nowadays many educational institutions crucially need to understand the dynamics at the basis of the university dropout (UD) phenomenon. However, the most informative educational data are personal and subject to strict privacy constraints. The challenge is therefore to develop a data driven system which accurately predicts students dropouts while preserving the privacy of individual data instances. In the present paper we investigate this problem, making use of data collected at University of Genoa as a case study.

Manuscript from author [PDF]

ES2017-70

Physical activity recognition from sub-bandage sensors using both feature selection and extraction

Thiago Turchetti Maia, Fabio Di Francesco, Valentina Dini, Beatrice Lazzerini, Marco Romanelli, Pietro Salvo

Abstract
In this paper, we present a neural network-based approach to classify the activities performed by 40 subjects by analyzing sub-bandage pressure signals. The approach includes an input dimensionality reduction obtained employing both feature extraction and feature selection techniques. The results show that our model is able to classify the activities performed with 98.12% accuracy.

Manuscript from author [PDF]

ES2017-11

A multi-criteria meta-learning method to select under-sampling algorithms for imbalanced datasets

Romero Morais, Péricles Miranda, Ricardo Silva

Abstract
Standard classifiers consider a balanced distribution of examples' classes in the data, thus, imbalanced datasets may hinder the learning process. Sampling techniques balance the data by adjusting the examples' classes distribution. However, selecting an appropriate sampling technique and its parameters for a given imbalanced dataset is still an open problem. This work proposes a method that uses Meta-Learning to recommend a technique for an imbalanced dataset considering multiple performance criteria. The experiments revealed that the proposal reached results comparable to those achieved by the brute-force approach, overcame the techniques with their default parameters most of the time, and always surpassed the random search approach.

Manuscript from author [PDF]

ES2017-71

Large-scale nonlinear dimensionality reduction for network intrusion detection

Yasir Hamid, Ludovic Journaux, John Aldo Lee, Lucile Sautot, Nabi Bushra, M. Sugumaran

Abstract
Network intrusion detection (NID) is a complex classification problem. In this paper, we combine classification with recent and scalable nonlinear dimensionality reduction (NLDR) methods. Classification and DR are not necessarily adversarial, provided adequate cluster magnification occurring in NLDR methods like t-SNE: DR mitigates the curse of dimensionality, while cluster magnification can maintain class separability. We demonstrate experimentally the effectiveness of the approach by analyzing and comparing results on the big KDD99 dataset, using both NLDR quality assessment and classification rate for SVMs and random forests. Since data involves features of mixed types (numerical and categorical), the use of Gower’s similarity coefficient as metric further improves the results over the classical similarity metric.

Manuscript from author [PDF]

ES2017-117

Acceleration of Prototype Based Models with Cascade Computation

Cem Karaoguz, Alexander Gepperth

Abstract
Prototype-based generative description of data space is shown to be effective in incremental learning. However, computation of similarities of input vectors to prototypes may be demanding especially in the face of high input dimensions and high number of prototypes. The main contribution of the paper is the acceleration of the prototype-based model by a cascade computation approach. The evaluation of the presented architecture on a human detection and pose estimation problem shows that the cascade computation results in a significant reduction of computational resource requirements at the expense of minor degradations in the classification performance.

Manuscript from author [PDF]

ES2017-42

Automatic crime report classi cation through a weightless neural network

Rafael Adnet Pinho, Walkir Brito, Claudia Motta, Priscila Lima

Abstract
Anonymous crime reporting is a tool that helps to reduce and prevent crime occurrences. The classification of the crime reports received by the call center is necessary for the data organization and also to stipulate the importance of a particular report and its relation to others. The objective of this work is to develop a system that assists the call center's operator by recommending classification to new reports. The system uses a weightless neural network that automatically attribute a class to a report. At the end of this work it was possible to observe that automatic classifications of crime reports with high accuracy are possible using a weightless neural network.

Manuscript from author [PDF]

ES2017-123

Efficient Neural-based patent document segmentation with Term Order Probabilities

Danilo Silva de Carvalho, Minh-Le Nguyen

Abstract
The internationally growing trend of patent applications puts great pressure on the agents involved in managing this kind of information and creates a demand for efficient and effective patent analysis methods. This work presents an computationally efficient approach for patent document segmentation based on structured ANNs and a simple distributional semantics composition method. The conducted experiments indicate effectiveness of the approach, which benefits a wide array of patent processing techniques that work upon structured inputs.

Manuscript from author [PDF]

[Back to Top]


Biomedical data analysis in translational research: integration of expert knowledge and interpretable models


ES2017-2

Biomedical data analysis in translational research: integration of expert knowledge and interpretable models

Gyan Bhanot, Michael Biehl, Thomas Villmann, Dietlind Zühlke

Abstract

Manuscript from author [PDF]

ES2017-67

Feature Relevance Bounds for Linear Classification

Christina Göpfert, Lukas Pfannschmidt, Barbara Hammer

Abstract
Biomedical applications often aim for an identification of relevant features for a given classification task, since these carry the promise of semantic insight into the underlying process. For correlated input dimensions, feature relevances are not unique, and the identification of meaningful subtle biomarkers remains a challenge. One approach is to identify intervals for the possible relevance of given features, a problem related to all relevant feature determination. In this contribution, we address the important case of linear classifiers and we transfer the problem how to infer feature relevance bounds to a convex optimization problem. We demonstrate the superiority of the resulting technique in comparison to popular feature-relevance determination methods in several benchmarks.

Manuscript from author [PDF]

ES2017-86

Prediction of preterm infant mortality with Gaussian process classification

Olli-Pekka Rinta-Koski, Simo Särkkä, Jaakko Hollmén, Markus Leskinen, Sture Andersson

Abstract
We present a method for predicting preterm infant in-hospital-mortality using Bayesian Gaussian process classification. We combined features extracted from sensor measurements, made during the first 24 hours of care for 581 Very Low Birth Weight infants, with standard clinical features calculated on arrival at the Neonatal Intensive Care Unit. We achieved a classification result with area under curve of 0.94 (standard error 0.02), which is in excess of the results achieved by using the clinical standard SNAP-II and SNAPPE-II scores.

Manuscript from author [PDF]

ES2017-94

Comparison of strategies to learn from imbalanced classes for computer aided diagnosis of inborn steroidogenic disorders

Sreejita Ghosh, Elizabeth Sarah Baranowski, Rick van Veen, Gert-Jan de Vries, Michael Biehl, Wiebke Arlt, Peter Tino, Kerstin Bunte

Abstract

Manuscript from author [PDF]

[Back to Top]


Environmental signal processing: new trends and applications


ES2017-1

Environmental signal processing: new trends and applications

Matthieu Puigt, Gilles Delmaire, Gilles Roussel

Abstract

Manuscript from author [PDF]

ES2017-103

Solving Inverse Source Problems for Sources with Arbitrary Shapes using Sensor Networks

John Murray-Bruce, Pier Luigi Dragotti

Abstract
Recently, the use of wireless sensor networks for environmental monitoring has been a topic of intensive research. The sensor nodes obtain spatiotemporal samples of physical fields over the region of interest. For most cases these fields are driven by well-known partial differential equations---the diffusion and wave equations for example---and this prior knowledge can be used to solve such \textit{physics-driven} inverse source problems (ISPs). In this work, we demonstrate how to estimate the unknown source shape inducing the field by assuming that it can be described by a model having a finite number of unknown parameters.

Manuscript from author [PDF]

ES2017-83

Non-negative decomposition of geophysical dynamics

Manuel Lopez-Radcenco, Abdeldjalil Aïssa-El-Bey, Pierre Ailliot, Ronan Fablet

Abstract
The decomposition of geophysical processes into relevant modes is a key issue for characterization, forecasting and reconstruction problems. The blind separation of contributions from different sources is a well-studied problem in signal and image processing. Recently, significant advances have been reported with the introduction of non-negative and sparse formulations. In this work, we address an extension to the blind decomposition of linear operators or transfer functions between variables of interest with an emphasis on a non-negative setting. As illustrated here, such decompositions are of key interest for the analysis of geophysical dynamics and the relationships between different geophysical variables.

Manuscript from author [PDF]

ES2017-74

Impact of the initialisation of a blind unmixing method dealing with intra-class variability

Charlotte REVEL, Yannick Deville, Véronique ACHARD, Xavier BRIOTTET

Abstract
In hyperspectral imagery, unmixing methods are often used to analyse the composition of the pixels. Such methods usually suppose that a single spectral signature, called an endmember, can be associated with each pure material present in the scene. Such an assumption is no more valid for materials that exhibit spectral variability due to illumination conditions, weathering, slight variations of the composition, etc. In this paper, we investigate a new method based on the assumption of a linear mixing model, that deals with intra-class spectral variability. A new formulation of the linear mixing is provided. In our model a pure material cannot be described by a single spectrum in the image but it can in a pixel. A method is presented to handle this new model. It is based on a pixel-by-pixel Nonnegative Matrix Factorization (NMF) methods. The method is tested on a semi-synthetic data set built with spectra extracted from a real hyperspectral image and mixtures of these spectra. We particularly focused our tests to study the impact of the initialisation of our method.

Manuscript from author [PDF]

ES2017-136

Application of Tensor and Matrix Completion on Environmental Sensing Data

Michalis Giannopoulos, Sofia Savvaki, Grigorios Tsagkatakis, Panagiotis Tsakalides

Abstract
As environmental resources utilization becomes more and more crucial, Wireless Sensor Networks (WSNs) are introduced in order to capture the variation of diverse parameters. However, limitations such as network connectivity, power consumption, and storage capacity lead to missing measurements from such networked sensors. To address this problem, we investigate the potential of recovering high dimensional environmental signals from small sets of observations. To account for the dimen- sionality of the data, we invoke tensor modelling and we propose a low-rank tensor recovery formulation. Experimental results using real WSN data from an indoor industrial environment as well as from an outdoor natural environment demonstrate that the estimation of missing measurements is much better addressed when structural information is considered.

Manuscript from author [PDF]

ES2017-131

Indoor air pollutant sources using Blind Source Separation Methods

Rachid OUARET, Anda IONESCU, Olivier RAMALHO, Yves CANDAU

Abstract
The objective of this study is to separate different sources of variability of air pollutant concentrations time series of particulate matter (PM) monitored in real indoor environments. Different blind source separation (BSS) methods (ICA, PMF, NMF) were applied in order to identify the PM sources and their contributions. The source profiles were characterized by their autocorrelation functions (ACF) which were compared to the ACFs of other variables. Their interpretation was completed by the analysis of polar plots including exogenous factors. Source contributions were also quantified.

Manuscript from author [PDF]

ES2017-149

High dimensionality voltammetric biosensor data processed with artificial neural networks

Andreu González-Calabuig, Georgina Faura, Manel del Valle

Abstract
This work report the coupling of an array of voltammetric sensors with artificial neural networks (ANN), usually named Electronic Tongue, for the simultaneous quantification of tryptophan, tyrosine and cysteine aminoacids. The obtained signals were compressed using fast Fourier transform (FFT) and then the ANN model was constructed from a set of low-frequency components. An ANN predictive model was obtained by back-propagation, which had 160 input neurons, one hidden layer with 7 neurons and used purelin and satlins functions in the hidden and output layer respectively, trained with a factorial design scheme . The model attained a total normalized root mean square error of 0.032 for an independent test set of data (n=15).

Manuscript from author [PDF]

[Back to Top]


Kernels, graphs and clustering


ES2017-116

Learning sparse models of diffusive graph signals

Shuyu Dong, Dorina Thanou, Pierre-Antoine Absil, Pascal Frossard

Abstract
Graph signals that describe data living on irregularly structured domains provide a generic representation for structured information in very diverse applications. The effective analysis and processing of such signals however necessitate good models that identify the most relevant signal components. In this paper, we propose to learn sparse representation models for graph signals that describe heat diffusion processes. This consists in learning a dictionary that incorporates spectral properties of an implicit graph diffusion kernel. The underlying formulation enables the identification of both sparse features and an adaptive graph structure from mere signal observations. Experiments on synthetic and real datasets show that the proposed dictionaries not only reflect the underlying diffusion process but also significantly reduce over-fitting of data in comparison to state-of-the-art methods.

Manuscript from author [PDF]

ES2017-127

The Conjunctive Disjunctive Node Kernel

Dinh Tran Van, Alessandro Sperduti, Fabrizio Costa

Abstract
Gene-disease associations are inferred on the basis of similarities between genes. Biological relationships that are exploited to define similarities range from interacting proteins, proteins that participate in pathways and gene expression profiles. Though graph kernel methods have become a prominent approach for association prediction, most solutions are based on a notion of information diffusion that does not capture the specificity of different network parts. Here we propose a graph kernel method that explicitly models the configuration of each gene’s context. An empirical evaluation on several biological databases shows that our proposal is competitive w.r.t. state-of-the-art kernel approaches.

Manuscript from author [PDF]

ES2017-66

POKer: a Partial Order Kernel for Comparing Strings with Alternative Substrings

Maryam Abdollahyan, Fabrizio Smeraldi

Abstract
We introduce a Partial Order Kernel (POKer) on the weighted sum of local alignment scores that can be used for comparison and classification of strings containing alternative substrings of variable length. POKer is defined over the product of two directed acyclic graphs, each representing a string with alternative substrings, and is computed efficiently using dynamic programming. We evaluate the performance of POKer with Support Vector Machines on a dataset of strings generated by detecting overlapping motifs in a set of simulated DNA sequences. Compared to a generalization of a state-of-the-art string kernel, POKer achieves a higher classification accuracy.

Manuscript from author [PDF]

ES2017-41

Accelerating stochastic kernel SOM

Jérôme Mariette, Fabrice Rossi, Madalina Olteanu, Nathalie Villa-Vialaneix

Abstract
Analyzing non vectorial data has become a common trend in a number of real-life applications. Various prototype-based methods have been extended to answer this need by means of kernalization that embed data into an (implicit) Euclidean space. One drawback of those approaches is their omplexity, which is commonly of order the square or the cube of the number of observations. In this paper, we propose an efficient method to reduce complexity of the stochastic kernel SOM. The results are illustrated on large datasets and compared to the standard kernel SOM. The approach has been implemented in the last version of the R package SOMbrero version 1.2.

Manuscript from author [PDF]

ES2017-49

Viral initialization for spectral clustering

Vahan Petrosyan, Alexandre Proutiere

Abstract
Spectral Clustering is one of the most widely used clustering algorithms. To find k clusters, it runs the K-means algorithm on the top k eigenvectors of a Laplacian matrix constructed from the data. As a consequence, it inherits the initialization issues of K-means. In this paper, we propose Viral Initialization (VI), a novel initialization procedure implemented in the Spectral Clustering algorithm before K-means is applied. VI is designed so that the resulting clusterings exhibit low normalized cut (Ncuts) values. This design principle is aligned with the recent observation that "good" clusterings have low Ncuts values. We show, through extensive numerical experiments, that the Spectral Clustering algorithm with VI consistently outperforms other state-of-the-art clustering techniques.

Manuscript from author [PDF]

ES2017-134

Approximated Neighbours MinHash Graph Node Kernel

Nicolò Navarin, Alessandro Sperduti

Abstract
In this paper, we propose a scalable kernel for nodes in a (huge) graph. In contrast with other state-of-the-art kernels that scale more than quadratically in the number of nodes, our approach scales lin- early in the average out-degree and quadratically in the number of nodes (for the Gram matrix computation). The kernel presented in this paper considers neighbours as sets, thus it ignores edge weights. Nevertheless, experimental results on real-world datasets show promising results.

Manuscript from author [PDF]

ES2017-140

Fast hyperparameter selection for graph kernels via subsampling and multiple kernel learning

Michele Donini, Nicolò Navarin, Ivano Lauriola, Fabio Aiolli, Fabrizio Costa

Abstract
Model selection is one of the most computationally expensive tasks in a machine learning application. When dealing with kernel methods for structures, the choice with the largest impact on the overall performance is the selection of the feature bias, i.e. the choice of the concrete kernel for structures. Each kernel in turn exposes several hyper-parameters which also need to be fine tuned. Multiple Kernel Learning offers a way to approach this computational bottleneck by generating a combination of different kernels under different parametric settings. However, this solution still requires the computation of many large kernel matrices. In this paper we propose a method to efficiently select a small number of kernels on a subset of the original data, gaining a dramatic reduction in the runtime without a significant loss of predictive performance.

Manuscript from author [PDF]

ES2017-24

A Simple Cluster Validation Index with Maximal Coverage

Susanne Jauhiainen, Tommi Karkkainen

Abstract
Clustering is an unsupervised technique to detect general, distinct profiles from a given dataset. Similarly to the existence of various different clustering methods and algorithms, there exists many cluster validation methods and indices to suggest the number of clusters. The purpose of this paper is, firstly, to propose a new, simple internal cluster validation index. The index has a maximal coverage: also one cluster, i.e., lack of division of a dataset into disjoint subsets, can be detected. Secondly, the proposed index is compared to the available indices from five different packages implemented in R or Matlab to assess its utilizability. The comparison also suggests many interesting findings in the available implementations of the existing indices. The experiments and the comparison support the viability of the proposed cluster validation index.

Manuscript from author [PDF]

ES2017-17

The Top 10 Topics in Machine Learning Revisited: A Quantitative Meta-Study

Patrick Glauner, Manxing Du, Victor Paraschiv, Andrey Boytsov, Isabel Lopez Andrade, Jorge Augusto Meira, Petko Valtchev, Radu State

Abstract
Which topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we revisit this question from a quantitative perspective. Concretely, we collect 54K abstracts of papers published between 2007 and 2016 in leading machine learning journals and conferences. We then use machine learning in order to determine the top 10 topics in machine learning. We not only include models, but provide a holistic view across optimization, data, features, etc. This quantitative approach allows reducing the bias of surveys. It reveals new and up-to-date insights into what the 10 most prolific topics in machine learning research are. This allows researchers to identify popular topics as well as new and rising topics for their research.

Manuscript from author [PDF]

[Back to Top]


Regression, robots and biological systems


ES2017-77

Piecewise-Bézier C1 smoothing on manifolds with application to wind field estimation

Pierre-Yves Gousenbourger, Estelle Massart, Antoni Musolas, Pierre-Antoine Absil, Julien M. Hendrickx, Laurent Jacques, Youssef Marzouk

Abstract
We propose an algorithm for fitting C1 piecewise-Bézier curves to (possibly corrupted) data points on manifolds. The curve is chosen as a compromise between proximity to data points and regularity. We apply our algorithm as an example to fit a curve to a set of low-rank covariance matrices, a task arising in wind field modeling. We show that our algorithm has denoising abilities for this application.

Manuscript from author [PDF]

ES2017-95

Reducing variance due to importance weighting in covariate shift bias correction

Van-Tinh Tran, Alex Aussem

Abstract
Covariate shift is a problem in machine learning when the input distributions of training and test data are different (p(x)≠ p′(x))while their conditional distribution p(y|x) is the same. A common technique to deal with this problem, called importance weighting, amounts to reweighting the training instances in order to make them resemble the test distribution. However this usually comes at the expense of a reduction of the effective sample size, which is harmful when the initial training sample size is already small. In this paper, we show that there exists a weighting scheme on the unlabeled data such that the combination of the weighted unlabeled data and the labeled training data mimics the test distribution.We further prove that the labels are missing at random in this combined data set and thus can be imputed safely. Imputing the missing labels mitigates the undesirable sample-size-reduction effect of importance weighting.A series of experiments on synthetic and real-world data are conducted to demonstrate the efficiency of our approach.

Manuscript from author [PDF]

ES2017-47

Complex activity patterns generated by short-term synaptic plasticity

Bulcsu Sandor, Claudius Gros

Abstract
Short-term synaptic plasticity (STSP) affects the efficiency of synaptic transmission for persistent presynaptic activities. We consider attractor neural networks, for which the attractors are given, in the absence of STSP, by cell assemblies of excitatory cliques. We show that STSP may transform these attracting states into attractor relics, inducing ongoing transient-state dynamics in terms of sequences of transiently activated cell assemblies, the former attractors. Subsequent cell assemblies may be both disjoint or partially overlapping. It may hence be possible to use the resulting dynamics for the generation of motor control sequences.

Manuscript from author [PDF]

ES2017-89

Criticality in Biocomputation

Tjeerd olde Scheper

Abstract
Complexity in biological computation is one of the recognised means by which biological systems manage to function in a complex chaotic world. The ability to function and solve problems irrespective of scale and relative complexity, including higher-order interactions, is essential to the efficacy of biological systems. However, it has been unclear how the required complexity can be introduced to allow these functions to be realised. Nonlinear local interactions are required to combine into a global stable system. The property of criticality, that is exhibited by many nonlinear physical systems, can be exploited to allow local nonlinear oscillators to interact, resulting in a globally stable system. This concept introduces robustness, as well as, a means to control global stability.

Manuscript from author [PDF]

ES2017-65

Scholar Performance Prediction using Boosted Regression Trees Techniques

Bernardo Stearns, Fabio Rangel, Flavio Rangel, Fabrício Faria, Jonice Oliveira

Abstract
The possibility of predicting a student performance based only on their socioeconomic status may help to infer what cultural features are important in education. This work was based on scores and socioeconomic data from the most popular exam to enter universities in Brazil: the National High School Exam. Statistical and computational methods used in data mining were applied on a data set of 8 millions data points from Brazil's National High School Exam to examine the predictability of the performance in Mathematics based on socioeconomic status. The results showed that it is possible to predict a students' scores using two ensemble techniques: AdaBoost and Gradient Boosting. The latter presented better results.

Manuscript from author [PDF]

ES2017-80

Imitation learning for a continuum trunk robot

Milad Malekzadeh Shafaroudi, Jeffrey F. Queißer, Jochen J. Steil

Abstract
The paper applies learning from demonstration (LfD) for high-level trajectory planning and movement control of the Bionic Handling Assistant (BHA) robot. For such soft continuum robot with mechanical elasticity and complex dynamics it is difficult to use kinesthetic teaching to collect demonstration data. We propose to use an active compliant controller to this aim and record both position and orientation of the BHA's end-effector. Subsequently, this data is then encoded with a state-of-the-art task-parameterized probabilistic Gaussian mixture model and its performance and generalization is experimentally evaluated.

Manuscript from author [PDF]

ES2017-141

ELM vs. WiSARD: a performance comparison

Luiz Oliveira, Felipe França

Abstract
The Extreme Learning Machine (ELM) is known for being a fast learning neural model. This work presents a performance comparison between ELM and the WiSARD weightless neural network model, regarding training and testing times, and classification accuracy as well. The two models were implemented in the same programming language and experiments were carried out on the same hardware environment. By using a group of datasets from the public repositories UCI and Statlog, experimental results shows that the WiSARD presented training times approximately one order of magnitude smaller than ELM, while classification accuracy varied according the number of classes involved. However, while WiSARD's architecture setups were not exhaustively searched, architecture setups for ELM were kept the same as the ones found in the literature as the best for each given dataset.

Manuscript from author [PDF]

ES2017-12

A novel principle for causal inference in data with small error variance

Patrick Blöbaum, Shohei Shimizu, Takashi Washio

Abstract
Causal inference addresses the problem of identifying cause and effect variables in observed data. While most of the current techniques base heavily on exploiting asymmetries in the error noise, these techniques struggle in data that only contain small noise. We present a novel principle for causal inference in data with small error variance. For this, we exploit an asymmetry in the prediction error under the assumption of additive noise and an independence between data generating mechanism and its input. The advantages of our approach is corroborated with empirical evaluations in artificial and real-world data sets.

Manuscript from author [PDF]

ES2017-10

Learning null space projections fast

Jeevan Manavalan, Matthew Howard

Abstract
Typically robot interactions with the environment may involve some type of constraint which impedes the motion of the system. This paper proposes an approach to learn kinematic constraints from observed movements. Our method derives the null space projection of a kinematically constrained system using gradient descent. Moreover, we compare this method to the existing brute force-based approach for learning constraints on datasets of different dimensionality, to demonstrate how it can learn constraints from datasets of a much higher dimensionality.

Manuscript from author [PDF]

ES2017-98

Comparison of adaptive MCMC methods

Edna Milgo, Nixon Ronoh, Peter Waiganjo Wagacha, Bernard Manderick

Abstract
We compare three adaptive MCMC samplers to Metropolis-Hastings algorithm with optimal proposal distribution as our benchmark. We transform a simple Evolution Strategy algorithm into a sampler and show that it already outperforms the other samplers on the test suite used in the initial research on adaptive MCMC.

Manuscript from author [PDF]

ES2017-113

Pseudo-analytical solutions for stochastic options pricing using Monte Carlo simulation and Breeding PSO-trained neural networks

Sam Palmer, Denise Gorse

Abstract
A neural network is trained using a novel form of particle swarm optimisation to learn the pricing formula for European call options using training samples generated via a Monte Carlo process. The trained neural network has effectively learnt an approximate analytical solution, with errors shown statistically comparable to Monte Carlo pricing, alleviating the need to re-run computationally costly simulations for different model parameter settings.

Manuscript from author [PDF]

ES2017-32

Spikes as regularizers

Anders Søgaard

Abstract
We present a confidence-based single-layer feed-forward learning algorithm {\sc Spiral}~(Spike Regularized Adaptive Learning) relying on an encoding of activation {\em spikes}. We adaptively update a weight vector relying on confidence estimates and activation offsets relative to previous activity. We regularize updates proportionally to item-level confidence and weight-specific support, loosely inspired by the observation from neurophysiology that high spike rates are sometimes accompanied by low temporal precision. Our experiments suggest that the new learning algorithm {\sc Spiral} is more robust and less prone to overfitting than both the averaged perceptron and {\sc Arow}

Manuscript from author [PDF]

ES2017-55

Moving Least Squares Support Vector Machines for weather temperature prediction

Zahra Karevan, Yunlong Feng, Johan A. K. Suykens

Abstract
Local learning methods have been investigated by many researchers. While global learning methods consider the same weight for all training points in model fitting, local learning methods assume that the training samples in the test point region are more influential. In this paper, we propose Moving Least Squares Support Vector Machines (M-LSSVM) in which each training sample is involved in the model fitting depending on the similarity between its feature vector and the one of the test point. The experimental results on an application of weather forecasting indicate that the proposed method can improve the prediction performance.

Manuscript from author [PDF]

ES2017-44

A Robust Minimal Learning Machine based on the M-Estimator

Joao Gomes, Diego Mesquita, Ananda Freire, Amauri Souza Junior, Tommi Karkkainen

Abstract
In this paper we propose a robust Minimal Learning Machine (R-RLM) for regression problems. The proposed method uses a robust M-estimator to generate a linear mapping between input and output distances matrices of MLM. The R-MLM was tested on one synthetic and three real world datasets that were contaminated with an increasing number of outliers. The method achieved a performance comparable to the robust Extreme Learning Machine (R-RLM) and thus can be seen as a valid alternative for regression tasks on datasets with outliers.

Manuscript from author [PDF]

[Back to Top]


Processing, Mining and Visualizing Massive Urban Data


ES2017-3

Processing, mining and visualizing massive urban data

Pierre Borgnat, Etienne Côme, Latifa Oukhellou

Abstract
The development of smart technologies and the advent of new observation capabilities have increased the availability of massive urban datasets that can greatly benefit urban studies. For example, a large amount of urban data is collected by various sensors, such as smart meters, or provided by GSM, Wi-Fi or Bluetooth records, ticketing data, geo-tagged posts on social networks, etc. Analysis of such digital records can help to build decision-making tools (for analytical, forecasting and display purposes) with a view to better understanding the operating of urban systems, to enable urban stakeholders to plan better when extending infrastructures and to provide better services to citizens in order to assist the development of the city and improve quality of life. This paper will focus on three main domains of application: transportation and mobility, water and energy.

Manuscript from author [PDF]

ES2017-93

Anomaly detection and characterization in smart card logs using NMF and Tweets

Emeric Tonnelier, Nicolas Baskiotis, Vincent Guigue, Patrick Gallinari

Abstract
This article describes a novel approach to detect anomalies in smart card logs. In this study, we chose to work on a 24h base for every station in the Parisian metro network. We also consider separately the 7 days of the week. We first build a robust averaged reference for (day,station) couples and then, we focus on the difference between particular situations and references. All experiments are conducted both on the raw data and using an NMF denoised approximation of the log flow. We demonstrate the interest and the robustness of the latter strategy. Then we mine RATP Twitter account to obtain ground truth information about operating incidents. This synchronized flow is used to evaluate our models.

Manuscript from author [PDF]

ES2017-25

Using degree constrained gravity null-models to understand the structure of journeys' networks in bicycle sharing systems

Remy Cazabet, Pierre Borgnat, Pablo Jensen

Abstract
Bicycle Sharing Systems are now ubiquitous in large cities around the world. In most of these systems, journeys' data can be extracted, providing rich information to better understand it. Recent works have used network analysis, and in particular space-corrected community detection, to analyse such datasets. In this paper, we show that spatial-null models used in previous methods have a systematic bias, and we propose a degree-contrained null-model to improve the results. We finally apply the proposed method on the BSS of a city.

Manuscript from author [PDF]

ES2017-138

A neuro-symbolic approach to GPS trajectory classification

Diego Carvalho, Felipe França, Raul Barbosa, Douglas Cardoso

Abstract
This paper proposes approaches to GPS trajectory classification problem in the context of the Rio de Janeiro's public transit system. The approaches are inspired by the neuro-symbolic sense of adding knowledge from the domain as opposed to the use of a raw machine learning approach. Experimental results show performance boosts when using these strategies.

Manuscript from author [PDF]

ES2017-22

Non-negative matrix factorization as a pre-processing tool for travelers temporal profiles clustering

Léna Carel, Pierre Alquier

Abstract
We propose to use non-negative matrix factorization (NMF) to build a dictionary of travelers temporal profiles. Clustering based on decomposition in this dictionary rather than on the full profiles (as in previous works) lead to more interpretable clusters.

Manuscript from author [PDF]

ES2017-31

Extracting urban water usage habits from smart meter data: a functional clustering approach

Nicolas CHEIFETZ, Allou Samé, Zineb Sabir, Anne-Claire Sandraz, Cédric Féliers

Abstract
The recent development of smart grids offers, through automated meter reading systems, the opportunity for an efficient and responsible management of water resources. In this framework, the present paper describes a novel methodology for identifying relevant usage profiles from hourly water consumption series collected by smart meters located on a water distribution network. The proposed approach operates in two stages. First, an additive time series decomposition model is used in order to extract seasonal patterns from the time series, which are intended to represent the customers habits in terms of water consumption. Then, two functional clustering approaches are used to group the extracted seasonal patterns into homogeneous clusters: a functional version of the well-known K-means algorithm, and a Fourier regression mixture-model-based algorithm. The two clustering strategies are applied to real world data from a smart grid deployed on a large water distribution network in France and a realistic interpretation of the consumption habits is given to each cluster.

Manuscript from author [PDF]

ES2017-72

Multiscale Spatio-Temporal Data Aggregation and Mapping for Urban Data Exploration

Anaïs Remy, Etienne Côme

Abstract
Maps seem the most intuitive way to visualize massive urban data but they also raise some well-known graphical problems (such as visual clutter, etc.). This paper focuses on processing massive spatio-temporal data in order to ease multi-scale exploration. To this end, we describe a preprocessing tool that enables the automatic creation of a multi-resolution grid from a high resolution grid of spatio-temporal data in a format compatible with webmapping applications (vector tiles). The use of this tool is exemplified through a prototype that offers the possibility to navigate into a massive itinerary request dataset collected in the Ile-de-France region.

Manuscript from author [PDF]

ES2017-73

Detection of non-recurrent road traffic events based on clustering indicators

Pierre-Antoine Laharotte, Romain Billot, Nour-Eddin El Faouzi

Abstract
We propose a new indicator for detecting non recurrent road traffic conditions. The idea is based on the perplexity of a generative probabilistic model (LDA) used for predicting traffic pattern. The resulting filter method reduces the inaccuracies of comparable detection method and enables a better separation between usual traffic pattern and non-recurrent situations.

Manuscript from author [PDF]

[Back to Top]


Signal and image processing, collaborative filtering


ES2017-23

Collaborative filtering with neural networks

Josef Feigl, Martin Bogdan

Abstract
Collaborative filtering methods try to determine a user's preferences given their historical usage data. In this paper, a flexible neural network architecture to solve collaborative filtering problems is reviewed and further developed. It will be shown how modern adaptive learning rate methods can be modified to allow the network to be trained in about half the time without sacrificing any predictive performance. Additionally, the effects of Dropout on the performance of the model are evaluated. The results of this approach are demonstrated on the Netflix Prize dataset.

Manuscript from author [PDF]

ES2017-45

Investigating optical transmission error correction using wavelet transforms

Weam Binjumah, Alexey Redyuk, Rod Adams, Neil Davey, Yi Sun

Abstract
Reducing bit error rate and improving performance of modern coherent optical communication system is a significant issue. As the distance travelled by the information signal increases, bit error rate will degrade. Support Vector Machines are the most up to date machine learning method for error correction in optical transmission systems. Wavelet transform has been a popular method to signals processing. In this study, results show that the bit error rate can be improved by using classification based on wavelet transforms (WT) and support vector machine (SVM).

Manuscript from author [PDF]

ES2017-133

WiSARDrp for Change Detection in Video Sequences

Massimo De Gregorio, Giordano Maurizio

Abstract
Weightless neural networks have been successfully used as learners and detectors of background regions in video processing, as they feature fast learning algorithm, noise tolerance and an incremental update of learnt knowledge, also referred to as online training. These features make weightless neural networks suitable and effective to be used for change (motion) detection in scenarios in which environmental changes (light, camera view, cluttered background) and moving objects force the modeling of background regions to change continuously and in drastic ways. In this paper, we present a change detection method in video processing that uses a weightless neural system, called WiSARDrp, as underlying learning mechanism, equipped with a reinforcing/weakening scheme, that builds and continuously updates a model of background at pixel-level. The performance of the proposed background modeling and change detection techniques are evaluated on the ChangeDetection.net video archive.

Manuscript from author [PDF]

ES2017-152

Learning human behaviors and lifestyle by capturing temporal relations in mobility patterns

Eyal Ben Zion, Boaz Lerner

Abstract
Many applications benefit from learning human behaviors and lifestyle. Different trajectories can represent a behavior, and previous behaviors and trajectories can influence decisions on further behaviors and on visiting future places and taking familiar or new trajectories. To more accurately explain and predict personal behavior, we extend a topic model to capture temporal relations among previous trajectories/weeks and current ones. In addition, we show how different trajectories may have the same latent cause, which we relate to lifestyle. The code for our algorithm is available online.

Manuscript from author [PDF]

ES2017-104

Hierarchical Combination of Video Features for Personalised Pain Level Recognition

Patrick Thiam, Viktor Kessler, Friedhelm Schwenker

Abstract
In this work, we present a personalized participant independent pain recognition system based on the video channel. Instead of using an entire annotated dataset to train a classification model that would be later applied to an unseen participant, a similarity metric is used to select the most interesting annotated samples based on the data of the unseen participant. These samples are subsequently used to train a model adapted to the unseen participant. The selection process helps to avoid redundant and irrelevant data samples, thus improves the performance as well as the efficiency of the trained model. From the video channel, several features are extracted and subsequently fed into an hierarchical fusion architecture to further improve the performance of the system.

Manuscript from author [PDF]

ES2017-29

A performance acceleration algorithm of spectral unmixing via subset selection

Jing Ke, Yi Guo, Arcot Sowmya, Tomasz Bednarz

Abstract
An acceleration algorithm for spectral unmixing approach is proposed based on subset selection. The method classifies the pixels in a spectral image into accurate and approximated unmixing groups based on the similarity and dissimilarity of geomorphological features in neighboring areas. Real spectral images are used for unmixing benchmark tests for accuracy and performance verification. The results reveal good performance speedup with only small accuracy loss.

Manuscript from author [PDF]

ES2017-16

Myoelectrical signal classification based on S transform and two-directional 2DPCA

Hong-Bo Xie, Hui Liu

Abstract
In order to extract discriminative information, time-frequency matrix is often transformed into a 1D vector followed by principal component analysis. This study contributes a two-directional two-dimensional principal component analysis (2D2PCA) based technique for time-frequency feature extraction. 2D2PCA is directly conducted on the time-frequency matrix obtained from the S transform rather than 1D vectors for feature extraction. The proposed method can significantly reduce the computational cost while capture the directions of maximal time-frequency matrix variance. The efficiency and effectiveness of the proposed method is demonstrated by classifying eight hand motions using four-channel myoelectric signals recorded in health subjects and amputees.

Manuscript from author [PDF]

ES2017-40

Hyper-spectral frequency selection for the classification of vegetation diseases

Klaas Dijkstra, Jaap van de Loosdrecht, Lambert Schomaker, Marco Wiering

Abstract
Reducing the use of pesticides by early visual detection of diseases in precision agriculture is important. Because of the color similarity between potato-plant diseases, narrow band hyper-spectral imaging is required. Payload restrains on unmanned aerial vehicles require reduction of spectral bands. Therefore, we present a methodology for per-pixel classification combined with hyper-spectral band selection. In controlled experiments performed on a set of individual leaves, we measure the performance of five classifiers and three dimensionality-reduction methods with three patch sizes. With the best-performing classifier an error rate of 1.5\% is achieved for distinguishing two important potato-plant diseases.

Manuscript from author [PDF]

ES2017-36

Outlining a simple and robust method for the automatic detection of EEG arousals

Isaac Fernández-Varela, Diego Álvarez-Estévez, Elena Hernández-Pereira, Vicente Moret-Bonillo

Abstract
This work proposes a new technique for the automatic detection of electroencephalographic (EEG) arousals in sleep polysomnographic recordings. We have developed a non-computationally complex algorithm with the idea of providing an easy integration into different software platforms. The approach combines different well-known signal analyses to identify relevant arousal patterns. Special emphasis is carried out to produce a robust, artifact tolerant algorithm. The resulting approach was tested using a database of 6 polysomnographic recordings from real patients, achieving an average kappa index of 0.77 with respect to the visual scorings made by clinical experts.

Manuscript from author [PDF]

ES2017-39

A decision support system based on cellular automata to help the control of late blight in tomato cultures

Gizelle Vianna, Gustavo Oliveira, Gabriel Cunha

Abstract
We designed and implemented a decision support system for small tomatoes producers that investigates ways to recognize the late blight disease from the analysis of digital images of tomatoes, using a pair of multilayer perceptron neural network. The networks outputs are used to calculate the damage level at each plant and to construct a situation map of a farm where a cellular automata simulates the outbreak evolution over the fields. The simulator can test different pesticides actions, helping in the decision on when to start the spraying and in the analysis of losses and gains of each choice of action.

Manuscript from author [PDF]

ES2017-139

Comparison of manual and semi-manual delineations for classifying glioblastoma multiforme patients based on histogram and texture MRI features

Adrian Ion-Margineanu, Sofie Van Cauter, Diana M Sima, Frederik Maes, Stefaan Sunaert, Uwe Himmelreich, Sabine Van Huffel

Abstract
In this paper we study the task of classifying the follow-up course of brain tumour patients that had surgery. Multiple magnetic resonance imaging brain scans were taken for each patient. We propose a simple method of delineating the contrast enhancing tumour lesion based on the total tumour region. We compare balanced accuracy values after tuning SVM-lin and SVM-rbf on histogram and 3-D texture features extracted from semi-manual and manual delineations. Results show that our proposed delineating method outperforms the classical method.

Manuscript from author [PDF]

ES2017-60

Latent variable analysis in hospital electric power demand using non-negative matrix factorization

Diego García, Ignacio Díaz, Daniel Pérez, Abel Cuadrado, Manuel Domínguez

Abstract
Energy disaggregation techniques have recently attracted much interest, since they allow to obtain latent patterns from power demand data in buildings, revealing useful information to the user. Unsupervised methods are specially attractive, since they do not require labeled datasets. Particularly, non-negative matrix factorization (NMF) methods allow to decompose a single power demand measurement over a certain time period into a set of components or "parts" that are sparse, non-negative and sum up the original measured quantity. Such components reveal hidden temporal patterns and events along this period, related to scheduling events and/or demand patterns from subsystems in the network, that are very useful within an energy efficiency context. In this paper we use this approach on demand data from a hospital during a one-year period, using a calendar visualization of the components, revealing relevant facts about the energy expenditure.

Manuscript from author [PDF]

ES2017-91

Supporting generative models of spatial behavior by user interaction

Ronny Hug, Wolfgang Hübner, Michael Arens

Abstract
The analysis of spatial behavior in terms of motion profiles recorded along trajectories is a widely used technique in video analysis. Inherent to this approach is the problem to assign a meaningful score to observations. This score builds the basis for classification, ranking, or to generate user feedback. Score assignment can be done in terms of deviations from normal behavior, where normality is determined by learning a generative model. A general drawback is that the unsupervised learning process often assigns non-intuitive scores. In order to address this problem this paper proposes the usage of interactive concepts, which support the learning process. Interaction thereby strongly utilizes the generative models capabilities to synthesize samples, to give insight into the underlying representation. Initial results are shown on a trajectory rating task, illustrating the feasibility of the proposed approach.

Manuscript from author [PDF]

[Back to Top]


Algorithmic Challenges in Big Data Analytics


ES2017-6

Algorithmic challenges in big data analytics

Veronica Bolon-Canedo, Beatriz Remeseiro, Konstantinos Sechidis, David Martínez-Rego, Amparo Alonso-Betanzos

Abstract
This session studies specific challenges that Machine Learning (ML) algorithms have to tackle when faced with Big Data problems. These challenges can arise when any of the dimensions in a ML problem grows significantly: a) size of training set, b) size of test set or c) dimensionality. The studies included in this edition explore the extension of previous ML algorithms and practices to Big Data scenarios. Namely, specific algorithms for recurrent neural network training, ensemble learning, anomaly detection and clustering are proposed. The results obtained show that this new trend of ML problems presents both a challenge and an opportunity to obtain results which could allow ML to be integrated in many new applications in years to come.

Manuscript from author [PDF]

ES2017-18

Partition-wise Recurrent Neural Networks for Point-based AIS Trajectory Classification

Xiang Jiang, Erico N de Souza, Xuan Liu, Behrouz Haji Soleimani, Xiaoguang Wang, Daniel L. Silver, Stan Matwin

Abstract
We present Partition-wise Recurrent Neural Networks (pRNNs) for point-based trajectory classification to detect fishing activities in the ocean. This method partitions each feature and uses region-specific parameters for distinct partitions, which can greatly improve the expressive power of deep recurrent neural networks on low-dimensional yet heterogeneous trajectory data. We show that our approach outperforms the state-of-the-art systems.

Manuscript from author [PDF]

ES2017-35

Scalable approximate k-NN Graph construction based on Locality Sensitive Hashing

Carlos Eiras-Franco, Leslie Kanthan, Amparo Alonso-Betanzos, David Martínez-Rego

Abstract
Nearest neighbours graphs are a pervasive basic construct in areas such as Data mining, Machine Learning and Information Retrieval. Among them, the k Nearest Neighbours Graph (kNNG), is probably the most studied of all. Unfortunately, its naı̈ve construction is in O(n 2 ) for n data points, which becomes a quagmire when scaling to Big Data. However sub-quadratic construction of kNNG remains an open question. This paper explores an adaptive algorithm based on Locality Sensitive Hashing which presents good performance on distributed architectures.

Manuscript from author [PDF]

ES2017-110

Degrees of Freedom in Regression Ensembles

Reeve Henry, Gavin Brown

Abstract
Negative correlation learning is an effective approach to ensemble learning in which model diversity is encouraged through a correlation penalty term. The level of emphasis placed upon the correlation penalty term is controlled by the diversity parameter. We shall provide a degrees of freedom analysis of negative correlation learning. Our contributions are as follows: we give an exact formula for the effective degrees of freedom in a negative correlation ensemble with fixed basis functions; we show that the effective degrees of freedom is a continuous, convex and monotonically increasing function of the diversity parameter; finally, we show that the degrees of freedom formula gives rise to an efficient way to tune the diversity parameter on large data sets.

Manuscript from author [PDF]

ES2017-82

Mutual information for improving the efficiency of the SCH algorithm

Diego Fernandez-Francos, Oscar Fontenla-Romero, Amparo Alonso-Betanzos, Gavin Brown

Abstract
A new approach to improve the efficiency of a one-class classification algorithm making it more suitable for big datasets is presented in this work. The original algorithm, called SCH (Scaled Convex Hull) algorithm, approximates a D-dimensional convex hull decision by means of random projections and an ensemble of 2-dimensional decisions. With this new approach we try to get rid of the redundant projections that lead to similar classification models in the low dimensional space. After the training phase, a new stage based on mutual information is added to the original algorithm in order to select the essential projections and remove the unnecessary ones, providing a lightweight classification model. This reduces significantly the computational complexity of the testing phase and preserves the performance of the original method. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these approach.

Manuscript from author [PDF]

ES2017-87

A distributed approach for classification using distance metrics

Laura Morán-Fernández, Veronica Bolon-Canedo, Amparo Alonso-Betanzos

Abstract
To cope with the huge quantity of data that fast development of sensoring, networking and inexpensive data storage has come, many distributed approaches have been developed during the last years. The main reason is that, when dealing with large datasets, most existing data mining algorithms do not scale well, and their efficiency may significantly deteriorate. Thus, we present a distributed approach by samples in which the original dataset will be divided into several nodes or processors. For classifying a new test sample, first we compute the distance to the data on each node, and then it will be classified by the model learned from the "closest" data. The proposed method has proved to be useful, demonstrating important savings in runtime and satisfactory performance.

Manuscript from author [PDF]

[Back to Top]


Deep learning


ES2017-48

Local Lyapunov Exponents of Deep RNN

Claudio Gallicchio, Alessio Micheli, Luca Silvestri

Abstract
The study of deep Recurrent Neural Network (RNN) models represents a research topic of increasing interest. In this paper we investigate layered recurrent architectures under a dynamical system point of view, focusing on characterizing the fundamental aspect of stability. To this end we provide a framework that allows the analysis of deepRNN dynamical regimes through the study of the maximum among the local Lyapunov exponents. Applied to the case of Reservoir Computing networks, our investigation also provides insights on the true merits of layering in RNN architectures, effectively showing how increasing the number of layers eventually results in progressively less stable global dynamics.

Manuscript from author [PDF]

ES2017-61

Learning Semantic Prediction using Pretrained Deep Feedforward Networks

Jörg Wagner, Volker Fischer, Michael Herman, Sven Behnke

Abstract
The ability to predict future environment states is crucial for anticipative behavior of autonomous agents. Deep learning based methods have proven to solve key perception challenges but currently mainly operate in a non-predictive fashion. We bridge this gap by proposing an approach to transform trained feed-forward networks into predictive ones via a combination of a recurrent predictive module with a teacher-student training strategy. This transformation can be conducted without the need of labeled data in a fully self-supervised fashion. Using simulated data, we demonstrate the ability of the resulting model to temporally predict a task-specific representation and additionally show the benefits of using our approach even when no corresponding feed-forward model is available.

Manuscript from author [PDF]

ES2017-102

Deep convolutional neural networks for detecting noisy neighbours in cloud infrastructure

Bruno Ordozgoiti, Alberto Mozo, Sandra Gómez Canaval, Udi Margolin, Elisha Rosensweig, Itai Segall

Abstract
Cloud infrastructure in data centers is expected to be one of the main technologies supporting Internet communications in the next few years. Virtualization is employed to achieve the flexibility and dynamicity required by the wide variety of applications used today. Therefore, optimal allocation of virtual machines is key to ensuring performance and efficiency. Noisy neighbor is a term used to describe virtual machines competing for physical resources and thus disturbing each other, a phenomenon that can dramatically degrade their performance. Detecting noisy neighbors using simple thresholding approaches is ineffective. To exploit the time-series nature of cloud infrastructure monitoring data, we propose an approach based on deep convolutional networks. We test it on real infrastructure data and show that it outperforms well-known classifiers in the detection of noisy neighbors.

Manuscript from author [PDF]

ES2017-109

Real-time convolutional networks for sonar image classification in low-power embedded systems

Matias Valdenegro-Toro

Abstract
Deep Neural Networks have impressive classification performance, but this comes at the expense of significant computational resources at inference time. Autonomous Underwater Vehicles use low-power embedded systems for sonar image perception, and cannot execute large neural networks in real-time. We propose the use of max-pooling aggressively, and we demonstrate it with a Fire-based module and a new Tiny module that includes max-pooling in each module. By stacking them we build networks that achieve the same accuracy as bigger ones, while reducing the number of parameters and considerably increasing computational performance. Our networks can classify a 96 × 96 sonar image with 98.8 − 99.7% accuracy on only 41 to 61 milliseconds on a Raspberry Pi 2, which corresponds to speedups of 28.6 − 19.7.

Manuscript from author [PDF]

ES2017-30

Approximate operations in Convolutional Neural Networks with RNS data representation

Valentina Arrigoni, Beatrice Rossi, Pasqualina Fragneto, Giuseppe Desoli

Abstract
In this work we modify the inference stage of a generic CNN by approximating computations using a data representation based on a Residue Number System at low-precision and introducing rescaling stages for weights and activations. In particular, we exploit an innovative procedure to tune up the system parameters that handles the reduced resolution while minimizing rounding and overflow errors. Our method decreases the hardware complexity of dot product operators and enables a parallelized implementation operating on values represented with few bits, with minimal loss in the overall accuracy of the network.

Manuscript from author [PDF]

ES2017-33

Learning convolutional neural network to maximize Pos@Top performance measure

Yanyan Geng, Liang Ru-Ze , Weizhi Li, Jingbin Wang, Liang Gaoyuan , Xu Chenhao , Wang Jing-Yan

Abstract
In the machine learning problems, the performance measure is used to evaluate the machine learning models. Recently, the number positive data points ranked at the top positions (Pos@Top) has been a popular performance measure in the machine learning community. In this paper, we propose to learn a convolutional neural network (CNN) model to maximize the Pos@Top performance measure. The CNN model is used to represent the multi-instance data point, and a classifier function is used to predict the label from the its CNN representation. We propose to minimize the loss function of Pos@Top over a training set to learn the filters of CNN and the classifier parameter. The classifier parameter vector is solved by the Lagrange multiplier method, and the filters are updated by the gradient descent method alternately in an iterative algorithm. Experiments over benchmark data sets show that the proposed method outperforms the state-of-the-art Pos@Top maximization methods.

Manuscript from author [PDF]

ES2017-122

Active learning strategy for CNN combining batchwise Dropout and Query-By-Committee

Melanie Ducoffe, Frédéric Precioso

Abstract
While the current trend is to increase the depth of neural networks to improve their performance, the size of the training database has to grow accordingly. We thus notice an emergence of tremendous databases, although providing labels to build a training set still remains a very expensive task. In this paper, we tackle the problem of selecting the samples to be labeled in an online fashion. We present an active learning strategy based on query by committee and dropout technique to train a Convolutional Neural Network (CNN). We evaluate our active learning strategy for CNN on MNIST and USPS benchmarks, showing in particular that selecting less than 22 % from the annotated database is enough to get similar error rate as using the full training set.

Manuscript from author [PDF]

ES2017-115

A Deep Q-Learning Agent for L-Game with Variable Batch Training

Petros Giannakopoulos, Yannis Cotronis

Abstract
We employ the Deep Q-Learning algorithm with Experience Replay to train an agent capable of achieving a high-level of play in the L-Game while self-learning from low-dimensional states. We also employ variable batch size for training in order to mitigate the loss of the rare reward signal and significantly accelerate training. Despite the large action space due to the number of possible moves, the low-dimensional state space and the rarity of rewards, which only come at the end of a game, DQL is successful in training an agent capable of strong play without the use of any search methods or domain knowledge.

Manuscript from author [PDF]

ES2017-100

TimeNet: Pre-trained deep recurrent neural network for time series classification

Pankaj Malhotra, VIshnu TV, Lovekesh Vig, Puneet Agarwal, Gautam Shroff

Abstract
Inspired by the tremendous success of deep Convolutional Neural Networks as generic feature extractors for images, we propose TimeNet: a deep recurrent neural network (RNN) trained on diverse time series in an unsupervised manner using sequence to sequence (seq2seq) models to extract features from time series. Rather than relying on data from the problem domain, TimeNet attempts to generalize time series representation across domains by ingesting time series from several domains simultaneously. Once trained, TimeNet can be used as a generic off-the-shelf feature extractor for time series. The representations or embeddings given by a pre-trained TimeNet are found to be useful for time series classification (TSC). For several publicly available datasets from UCR TSC Archive and an industrial telematics sensor data from vehicles, we observe that a classifier learned over the TimeNet embeddings yields significantly better performance compared to (i) a classifier learned over the embeddings given by a domain-specific RNN, as well as (ii) a nearest neighbor classifier based on Dynamic Time Warping.

Manuscript from author [PDF]

ES2017-56

Uncertain photometric redshifts via combining deep convolutional and mixture density networks

Antonio D'Isanto, Kai Lars Polsterer

Abstract
The need for accurate photometric redshifts estimation is a major subject in Astronomy. This is due to the necessity of efficiently obtaining redshift information without the need for spectroscopic analysis. We propose a method for determining accurate multi-modal predictive densities for redshift, using Mixture Density Networks and Deep Convolutional Networks. A comparison with the Random Forest is carried out and superior performance of the proposed architecture is demonstrated.

Manuscript from author [PDF]

ES2017-90

Feature Extraction and Learning for RSSI based Indoor Device Localization

Stavros Timotheatos, Grigorios Tsagkatakis, Panagiotis Tsakalides, Panos Trahanias

Abstract
In this paper, we study and experimentally compare two state-of-the-art methods for low dimensional feature extraction, within the context of RSSI fingerprinting for localization. On one hand, we consider Stacked Autoencoders, a prominent example of a deep learning architecture, while on the other hand, we explore Random Projections, a universal feature extraction approach. Experimental results suggest that feature learning has a dramatic impact on the subsequent analysis like location based classification.

Manuscript from author [PDF]

[Back to Top]