ESANN2018

26th European Symposium on Artificial Neural Networks
Bruges, Belgium, April 25-26-27

[Electronic proceedings home page] [Electronic proceedings author index]

ESANN2018
Content of the proceedings

WARNING: you need Adobe Acrobat reader 7.0 or more to view the PDF files below



Deep learning and image processing


ES2018-166

A Sub-Layered Hierarchical Pyramidal Neural Architecture for Facial Expression Recognition

Henrique Siqueira, Pablo Barros, Sven Magg, Cornelius Weber, Stefan Wermter

Abstract
In domains where computational resources and labeled data are limited, such as in robotics, deep networks with millions of weights might not be the optimal solution. In this paper, we introduce a connectivity scheme for pyramidal architectures to increase their capacity for learning features. Experiments on facial expression recognition of unseen people demonstrate that our approach is a potential candidate for applications with restricted resources, due to good generalization performance and low computational cost. We show that our approach generalizes as well as convolutional architectures in this task but uses fewer trainable parameters and is more robust for low-resolution faces.

Manuscript from author [PDF]

ES2018-102

interpretation of convolutional neural networks for speech regression from electrocorticography

Miguel Angrick, Christian Herff, Garett Johnson, Jerry Shih, Dean Krusienski, Tanja Schultz

Abstract
The direct synthesis of continuously spoken speech from neural activity is envisioned to enable fast and intuitive Brain-Computer Interfaces. Earlier results indicate that intracranial recordings reveal very suitable signal characteristics for direct synthesis. To map the complex dynamics of neural activity to spectral representations of speech, Convolutional Neural Networks (CNNs) can be trained. However, the resulting networks are hard to interpret and thus provide little opportunity to gain insights on neural processes underlying speech. Here, we show that CNNs are useful to reconstruct speech from intracranial recordings of brain activity and propose an approach to interpret the trained CNNs.

Manuscript from author [PDF]

ES2018-188

transferring style in motion capture sequences with adversarial learning

QI WANG, Mickael CHEN, thierry Artieres, Ludovic Denoyer

Abstract
We focus on style transfer for sequential data in a supervised setting. Assuming sequential data include both content and style information we want to learn models able to transform a sequence into another one with the same content information but with the style of another one, from a training dataset where content and style labels are available. Following works on image generation and edition with adversarial learning, we explore the design of neural network architectures for the task of sequence edition that we apply to motion capture sequences.

Manuscript from author [PDF]

ES2018-164

Properties of adv−1 – Adversarials of Adversarials

Nils Worzyk, Oliver Kramer

Abstract
Neural networks are very successful in the domain of image processing, but they are still vulnerable against adversarial images – carefully crafted images to fool the neural network during image classification. There are already some attacks to create those adversarial images, therefore the transition from original images to adversarial images is well understood. In this paper we apply adversarial attacks on adversarial images. These new images are called adv−1. The goal is to investigate the transition from adversarial images to adv−1 images. This knowledge can be used to 1.) identify adversarial images and 2.) to find the original class of adversarial images.

Manuscript from author [PDF]

ES2018-96

An analysis of subtask-dependency in robot command interpretation with dilated CNNs

Manfred Eppe, Tayfun Alpay, Fares Abawi, Stefan Wermter

Abstract
In this paper, we tackle sequence-to-tree transduction for language processing with neural networks implementing several subtasks, namely tokenization, semantic annotation, and tree generation. Our research question is how the individual subtasks influence the overall end-to-end learning performance in case of a convolutional network with dilated perceptive fields. We investigate a benchmark problem for robot command interpretation and conclude that dilation has a strong positive effect for performing character-level transduction and for generating parsing trees.

Manuscript from author [PDF]

ES2018-200

Image retrieval and ranking through Deep Comparative Neural Networks

Aymen Cherif, Salim Jouili

Abstract
Information retrieval is the task of extracting the most accurate documents from an existing collection with respect to a certain query. We focus our work to instance-level image retrieval. We approach this problem from the point of view of learning to rank. We explore the idea of using the pair-wise ranking model instead of simply providing a similarity measure between a query and a candidate document. We also investigate the ability of this a model to capture high level features that are query-document joint features and category independent.

Manuscript from author [PDF]

ES2018-154

Incremental learning with deep neural networks using a test-time oracle

Alexander Gepperth, Saad Abdullah Gondal

Abstract
We present a simple idea to avoid catastrophic forgetting when training deep neural networks (DNNs) on class-incremental tasks. This means that initial training is conducted on a sub-task described by a dataset $D1$, whereas re-training is conducted subsequently, on a sub-task described by a dataset $D2$ that is composed of different classes. As our recent work suggest that DNNs perform very poorly at this problem, we propose a simple extension that proposes an individually trained readout layer for each sub-task. While this is unproblematic for training, a clustering method is used at test time to determine to which sub-task a sample most likely belongs. Experiments on simple benchmarks derived from MNIST show the effectiveness of this method for which a dedicated TensorFlow implementation is made available.

Manuscript from author [PDF]

ES2018-162

Image-to-Text Transduction with Spatial Self-Attention

Sebastian Springenberg, Egor Lakomkin, Cornelius Weber, Stefan Wermter

Abstract
Attention mechanisms have been shown to improve recurrent encoder-decoder architectures in sequence-to-sequence learning scenarios. Recently, the Transformer model has been proposed which only applies dot-product attention and omits recurrent operations to obtain a source-target mapping. This paper shows that the concepts of self- and inter-attention can effectively be applied in an image-to-text task. The encoder applies pre-trained convolution and pooling operations followed by self-attention to obtain an image feature representation. Self-attention combines image features of regions based on their similarity before they are made accessible to the decoder through inter-attention.

Manuscript from author [PDF]

ES2018-88

Hierarchical Recurrent Filtering for Fully Convolutional DenseNets

Jörg Wagner, Volker Fischer, Michael Herman, Sven Behnke

Abstract
Generating a robust representation of the environment is a crucial ability of learning agents. Deep learning based methods have greatly improved perception systems but still fail in challenging situations. These failures are often not solvable on the basis of a single image. In this work, we present a parameter-efficient temporal filtering concept which extends an existing single-frame segmentation model to work with multiple frames. The resulting recurrent architecture temporally filters representations on all abstraction levels in a hierarchical manner, while decoupling temporal dependencies from scene representation. Using a synthetic dataset, we show the ability of our model to cope with data perturbations and highlight the importance of recurrent and hierarchical filtering.

Manuscript from author [PDF]

ES2018-70

Towards cognitive automotive environment modelling: reasoning based on vector representations

Florian Mirus, Terrence C. Stewart, Jörg Conradt

Abstract
In this paper, we propose a novel approach to knowledge representation for automotive environment modelling based on Vector Symbolic Architectures (VSAs). We build a vector representation describing structured information and relations within the current scene based on high-level object-lists perceived by individual sensors. Such a representation can be applied to different tasks with little modifications. In a sample instantiation, we focus on two example tasks, namely driving context classification and simple behavior prediction, to demonstrate the general applicability of our approach. Allowing efficient implementation in Spiking Neural Networks (SNNs), we envision to improve task performance of our approach through online-learning.

Manuscript from author [PDF]

ES2018-61

Inferencing based on unsupervised learning of disentangled representations

Tobias Hinz, Stefan Wermter

Abstract
Combining Generative Adversarial Networks (GANs) with encoders that learn to encode data points has shown promising results in learning data representations in an unsupervised way. We propose a framework that combines an encoder and a generator to learn disentangled representations which encode meaningful information about the data distribution without the need for any labels. While current approaches focus mostly on the generative aspects of GANs, our framework can be used to perform inference on both real and generated data points. Experiments on several data sets show that the encoder learns interpretable, disentangled representations which encode descriptive properties and can be used to sample images that exhibit specific characteristics.

Manuscript from author [PDF]

ES2018-32

Dynamic autonomous image segmentation based on Grow Cut

Alexandru-Ion Marinescu, Zoltán Bálint, Laura Dioșan, Anca Andreica

Abstract
The main incentive of this paper is to provide an enhanced approach for 2D medical image segmentation based on the Unsupervised Grow Cut algorithm, a method that requires no prior training. This paper assumes that the reader is, to some extent, familiar with cellular automata and their function as they make up the core of this technique. The benchmarks were performed on 2D MRI images of the heart and chest cavity. We obtained a significant increase in the output quality as compared to classical Unsupervised Grow Cut by using standard measures, based on the existence of accurate ground truth. This increase was obtained by dynamically altering the local threshold parameter. In conclusion, our approach provides the opportunity to become a building block of a computer aided diagnostic system.

Manuscript from author [PDF]

ES2018-169

Continuous convolutional object tracking

Peer Springstübe, Stefan Heinrich, Stefan Wermter

Abstract
Tracking arbitrary objects is a challenging task in visual computing. A central problem is the need to adapt to the changing appearance of an object, particularly under strong transformation and occlusion. We propose a tracking framework that utilises the strengths of Convolutional Neural Networks (CNNs) to create a robust and adaptive model of the object from training data produced during tracking. An incremental update mechanism provides increased performance and reduces training during tracking, allowing its real-time use.

Manuscript from author [PDF]

ES2018-155

Active Learning based on Transfer Learning Techniques for Image Classification

Daniela Onita, Adriana Birlutiu

Abstract
In many imaging tasks only an expert can annotate the data. Though domain experts are available, their labor is expensive and we would like to avoid querying them whenever possible. Our task is to make use of our resources as efficient as possible for a learning task. There are various ways of working in cases of labelled data shortage. This type of learning problems can be approached with Active and Transfer Learning techniques. Active Learning and Transfer Learning have demonstrated their efficiency and ability to train accurate models with significantly reduced amount of training data in many real-life applications. In this paper we investigate the combination of Active and Transfer Learning for building an efficient algorithm for image classification. The experimental results show that by combining active and transfer learning, we can learn faster with fewer labels on a target domain than by random selection.

Manuscript from author [PDF]

ES2018-141

Near-optimal facial emotion classification using a WiSARD-based weightless system

Leopoldo Lusquino Filho, Felipe França, Priscila Lima

Abstract
The recognition of facial expressions through the use of a WiSARD-based n-tuple classifier is explored in this work. The competitiveness of this weightless neural network is tested in the specific challenge of identifying emotions from photos of faces, limited to the six basic emotions described in the seminal work of Ekman and and Friesen (1977) on the identification of facial expressions. Current state-of-the-art for this problem uses a convolutional neural network (CNN), with accuracy of 100% and 99.6% in the Cohn-Kanade and MMI datasets, respectively, with the proposed WiSARD-based architecture reaching accuracy of 100% and 99.4% in the same datasets.

Manuscript from author [PDF]

ES2018-142

Spatial pooling as feature selection method for object recognition

Murat Kirtay, Lorenzo Vannucci, Ugo Albanese, Alessandro Ambrosano, Egidio Falotico, Cecilia Laschi

Abstract
This paper reports our work on object recognition by using the spatial pooler of Hierarchical Temporal Memory (HTM) as a method for feature selection. To perform recognition task, we employed this pooling mechanism to select features from COIL-100 dataset. We benchmarked the results with the state-of-the-art feature extraction methods while using different amounts of training data (from 5% to 45%). The results indicate that the performed method is effective for object recognition with a low amount of training data in which the hand-engineered state-of-the-art feature extraction methods show limitations.

Manuscript from author [PDF]

[Back to Top]


Interaction and User Integration in Machine Learning for Information Visualisation


ES2018-3

Information visualisation and machine learning: latest trends towards convergence

Benoît Frénay, Bruno Dumas, John A. Lee

Abstract
Many methods have been developed in machine learning (ML) for information visualisation (infovis). For example, PCA, MDS, t-SNE and improvements are standard tools to reduce the dimensionality of high dimensional datasets for visualisation purposes. However, multiple other means are regularly used in the field of infovis when tackling datasets with high dimensionality. Letting the user manipulate the visualisation is one of these means, either through selection, navigation or filtering. Introducing manipulation of the visualisation also integrates the user as a core aspect of a given system. In the context of machine learning, beyond the informational and exploratory use of infovis, users' feedback can for example be highly informational to drive the dimensionality reduction process. This special session of the ESANN conference is a followup of the special session on "Information Visualisation and Machine Learning: Techniques, Validation and Integration" at ESANN 2016. It aims to gather researchers that integrate users in the core of ML methods for infovis. New algorithms and frameworks are welcome, as well as experimental use cases that bring new insight in the integration of interaction and user integration in ML for infovis. This special session aims to provide practitioners from both communities a common forum of discussion where issues at the crossroads of machine learning and information visualisation could be discussed.

Manuscript from author [PDF]

ES2018-74

VisCoDeR: A tool for visually comparing dimensionality reduction algorithms

Rene Cutura, Stefan Holzer, Michaël Aupetit, Michael Sedlmair

Abstract
We propose VisCoDeR, a tool that leverages comparative visualization to support learning and analyzing different dimensionality reduction (DR) methods. VisCoDeR fosters two modes. The Discover mode allows to qualitatively compare several DR results by juxtaposing and linking the resulting scatterplots. The Explore mode allows for analyzing hundreds of differently parameterized DR results in a quantitative way. We present use cases that show that our approach helps to understand similarities and differences between DR algorithms.

Manuscript from author [PDF]

ES2018-158

G-Rap: interactive text synthesis using recurrent neural network suggestions

Udo Schlegel, Eren Cakmak, Juri Buchmüller, Daniel Keim

Abstract
Finding the best neural network configuration for a given goal can be challenging, especially when it is not possible to assess the output quality of a network automatically. We present G-Rap, an interactive interface based on Visual Analytics principles for comparing outputs of multiple RNNs for the same training data. G-Rap enables an iterative result generation process that allows a user to already work productively while evaluating the outputs with contextual statistics at the same time. We demonstrate the applicability of G-Rap at the example of interactive music lyrics generation.

Manuscript from author [PDF]

ES2018-47

Interactive dimensionality reduction of large datasets using interpolation

Ignacio Diaz-Blanco, Daniel Pérez, Abel A. Cuadrado, Diego Garcia-Perez, Dominguez Manuel

Abstract
In this work we present an approach to achieve interactive dimensionality reduction (iDR) on large datasets. The main idea of the paper relies on using generalized regression neural network (GRNN) interpolation to obtain massive out of sample projections from iDR projections obtained on a reduced sample of the original dataset. The proposed method allows to achieve fluid iDR interaction on datasets between 45 times and 100 times larger than with the original DR method for similar latencies, yet achieving good distance preservation. The paper includes a rank-based comparison between the proposed method and the DR method used alone for different datasets and parameter values.

Manuscript from author [PDF]

[Back to Top]


Nonlinear dimensionality reduction


ES2018-185

Perplexity-free t-SNE and twice Student tt-SNE

Cyril de Bodt, Dounia Mulders, Michel Verleysen, John A. Lee

Abstract
In fields of dimensionality reduction and data visualisation, t-SNE has become recently a very popular method. In this paper, we propose two variants to the Gaussian neighbourhoods used to characterise the neighbourhoods around each high-dimensional datum in t-SNE. A first alternative is to use t distributions just like they are used already in the low-dimensional embedding space; a variable degree of freedom accounts for the intrinsic dimensionality of data. The second variant relies on compounds of Gaussian neighbourhoods with growing widths, thereby suppressing the for the user to adjust a single size or perplexity. In both cases, neighbourhoods with heavy tails are thus used in the data space. Experiments show that both variants are competitive, with no extra cost.

Manuscript from author [PDF]

ES2018-173

Generative Kernel PCA

Joachim Schreurs, Johan Suykens

Abstract
Kernel PCA has shown to be a powerful feature extractor within many applications. Using the Restricted Kernel Machine formulation, a representation using visible and hidden units is obtained. This enables the exploration of new insights and connections between Restricted Boltzmann machines and kernel methods. This paper explores these connections, introducing a generative kernel PCA which can be used to generate new data, as well as denoise a given training dataset. Moreover, relations with linear PCA and a pre-image reconstruction method are introduced in this paper.

Manuscript from author [PDF]

ES2018-76

Extensive assessment of Barnes-Hut t-SNE

Cyril de Bodt, Dounia Mulders, Michel Verleysen, John A. Lee

Abstract
Stochastic Neighbor Embedding (SNE) and variants are dimensionality reduction (DR) methods able to foil the curse of dimensionality to deliver outstanding experimental results. Mitigating the crowding problem, t-SNE became an extremely popular DR scheme. Its quadratic time complexity in the number of samples is nevertheless unaffordable for big data sets. This motivates its Barnes-Hut (BH) acceleration for large-scale use. Although the latter is faster by orders of magnitude, few studies quantify its DR quality with respect to t-SNE. Extensive comparisons between t-SNE and its BH version are conducted using neighborhood preservation-based criteria. Both methods perform very similarly, suggesting the BH scheme superiority thanks to its reduced time complexity.

Manuscript from author [PDF]

ES2018-41

Understanding wafer patterns in semiconductor production with variational auto-encoders

Tiago Santos, Roman Kern

Abstract
Semiconductor manufacturing processes critically depend on hundreds of highly complex process steps, which may cause critical deviations in the end-product. Hence, a better understanding of wafer test data patterns, which represent stress tests conducted on devices in semiconductor material slices, may lead to an improved production process. However, the shapes and types of these wafer patterns, as well as their relation to single process steps, are unknown. In a first step to address these issues, we tailor and apply a variational auto-encoder (VAE) to wafer pattern images. We find the VAE's generator allows for explorative wafer pattern analysis, and its encoder provides an effective dimensionality reduction algorithm, which, in a clustering application, performs better than several baselines such as t-SNE and yields interpretable clusters of wafer patterns.

Manuscript from author [PDF]

[Back to Top]


Classification


ES2018-53

Feature noise tuning for resource efficient Bayesian Network Classifiers

Laura Isabel Galindez Olascoaga, Jonas Vlasselaer, Wannes Meert, Marian Verhelst

Abstract
Emerging portable applications require always-on sensing technologies to continuously monitor the environment and their user's needs. Yet, the high power consumption that results from this continuous sensing, often hampers these systems' always-on functionality. In this paper we propose a hardware-aware Machine Learning scheme that exploits the devices' ability to trade-off the quality of its sensors versus its power consumption. We introduce a technique that extends Bayesian Network classifiers with hardware description nodes that encode the probabilistic relation between sensory features and their degraded versions. We show how this allows to tune the hardware device's power consumption versus inference accuracy trade-off space with fine granularity, resulting in operating points that achieve significant power savings at almost no accuracy loss. This is empirically shown on various Machine Learning benchmarking datasets.

Manuscript from author [PDF]

ES2018-97

Reliable Patient Classification in Case of Uncertain Class Labels Using a Cross-Entropy Approach

Andrea Villmann, Marika Kaden, Sascha Saralajew, Wieland Hermann, Thomas Villmann

Abstract
Classification learning crucially depends on the correct label information in training data. We consider the problem that a respective uncertainty can neither be neglected nor it can be approximated by a statistical model. In the proposed approach each training data is equipped with a certainty value reflecting the probability of the label correctness. This information is used in the learning process for the classifier. For this purpose, we adopt the cross-entropy cost function from deep learning for a modified learning vector quantization model. We show the usefulness of this knowledge integration in medical diagnostic data analysis for detection of Wilson's disease as an example.

Manuscript from author [PDF]

ES2018-108

behaviour-based working memory capacity classification using recurrent neural networks

Mazen Salous, Felix Putze

Abstract
A user's working memory capacity is a crucial factor for successful Human Computer Interaction. While reliable tests for working memory capacity are available, they are time-consuming, stressful, and not well-integrated into HCI applications. This paper presents a classifier based on Long Short Term Memory networks to exploit sparse temporal dependencies in behavioural data, collected in a complex, memory-intense interaction task, to classify working memory capacity. A cognitive user simulation is introduced to generate additional training data episodes that follow the behaviour of existing real data. We show that the classifier outperforms a linear baseline especially for short segments of data.

Manuscript from author [PDF]

ES2018-118

Structuring and Solving Multi-Criteria Decision Making Problems using Artificial Neural Networks: a smartphone recommendation case

Victor Amaral De Sousa, Anthony Simonofski, Monique Snoeck, Ivan Jureta

Abstract
Several techniques can be used to solve multi-criteria decision making (MCDM) problems and to provide a global ranking of the alternatives considered. However, in a context with a high number of alternatives and where decision criteria relate to soft goals, the decision problem is particularly hard to solve. This paper analyzes the use of artificial neural networks to improve the relevance of the ranking of alternatives delivered by MCDM problem-solving techniques. Afterwards, a model using a combination of artificial neural networks and of the weighted sum model, a particular MCDM problem-solving technique, is built to recommend smartphones.

Manuscript from author [PDF]

ES2018-127

Efficient accuracy estimation for instance-based incremental active learning

Christian Limberg, Heiko Wersing, Helge Ritter

Abstract
Estimating system's accuracy is crucial for applications of incremental learning. In this paper, we introduce the Distogram Estimation (DGE) approach to estimate the accuracy of instance-based classifiers. By calculating relative distances to samples it is possible to train an offline regression model, capable of predicting the classifier's accuracy on unseen data. Our approach requires only a few supervised samples for training and can instantaneously be applied on unseen data afterwards. We evaluate our method on five benchmark data sets and for a robot object recognition task. Our algorithm clearly outperforms two baseline methods both for random and active selection of incremental training examples.

Manuscript from author [PDF]

ES2018-168

Boolean kernels for interpretable kernel machines

Mirko Polato, Fabio Aiolli

Abstract
Most of the machine learning (ML) community's efforts in the last decades have been devoted to improving the power and the prediction quality of ML models at the expense of their interpretability. However, nowadays, ML is becoming more and more ubiquitous and it is increasingly demanded the need for models that can be interpreted. To this end, in this work we propose a method for extracting explanation rules from a kernel machine. The core idea is based on using kernels with feature spaces composed by logical propositions. On top of that, a searching algorithm tries to retrieve the most relevant features/rules that can be used to explain the trained model. Experiments on several benchmarks and artificial datasets show the effectiveness of the proposed approach.

Manuscript from author [PDF]

ES2018-181

The minimum effort maximum output principle applied to Multiple Kernel Learning

Ivano Lauriola, Mirko Polato, Fabio Aiolli

Abstract
The Multiple Kernel Learning (MKL) paradigm aims at learning the representation from data reducing the effort devoted to the choice of kernel's hyperparameters. Typically, the resulting kernel is obtained as the maximal margin combination of a set of base kernels. When too expressive base kernels are provided to the MKL algorithm, the solution found by these algorithms can overfit data. In this paper, we propose a novel MKL algorithm which takes into consideration the expressiveness of the obtained representation in its objective function in such a way that a trade-off between large margins and simple hypothesis spaces can be found. Moreover, an empirical comparison against hard baselines and state-of-the-art MKL methods on several real-world datasets is presented showing the merits of the proposed algorithm especially with respect to the robustness to overfitting.

Manuscript from author [PDF]

ES2018-113

One-class Autoencoder approach to classify Raman spectra outliers

Katharina Hofer-Schmitz, Phuong-Ha Nguyen, Kristian Berwanger

Abstract
We present an one-class Anomaly detector based on (deep) Autoencoder for Raman spectra. Omitting preprocessing of the spectra, we use raw data of our main class to learn the reconstruction, with many typical noise sources automatically reduced as the outcome. To separate anomalies from the norm class, we use several, independent statistical metrics for a majority voting. Our evaluation shows a f1-score of up to 99% success.

Manuscript from author [PDF]

ES2018-161

Radar Based Pedestrian Detection using Support Vector Machine and the Micro Doppler Effect

Joao Victor Bruneti Severino, Alessandro Zimmer, Leandro dos Santos Coelho, Roberto Zanetti Freire

Abstract
Based on alarming statistics related to both pedestrian fatalities and injuries in traffic accidents, this paper presents the development of a pedestrian detection method for an Advanced Driving Assistance System (ADAS). Using a 79GHz automotive radar, a signal processing application that can early identify pedestrians in short range situations using Support Vector Machine (SVM) was presented and evaluated in order to improve the velocity resolution for the micro Doppler effects extraction. By assuming pre-processing multiobjective optimization, promising results in terms of velocity resolution and measuring time were obtained, improving the accuracy of the classifier.

Manuscript from author [PDF]

ES2018-198

Opposite neighborhood: a new method to select reference points of minimal learning machines

Madson Dias, Lucas Sousa, Ajalmar Rocha Neto, Amauri Souza Junior

Abstract
This paper introduces a new approach to select reference points of minimal learning machines (MLMs) for classification tasks. The MLM training procedure is related to the selection of a subset of the training set, named reference points (RPs), that is used to build a mapping between the input geometric configurations and their corresponding labels. We propose a method, named opposite neighborhood (ON), that explores the Euclidean distance in input space to select RPs. Experiments were performed using UCI data sets. The proposal was able to both reduce the number of reference points and achieve competitive performance when compared to conventional approaches for selecting RPs.

Manuscript from author [PDF]

ES2018-60

A neural network cost function for highly class-imbalanced data sets

David Twomey, Denise Gorse

Abstract
We introduce a new cost function for the training of a neural network classifier in conditions of high class imbalance. This function, based on an approximate confusion matrix, represents a balance of sensitivity and specificity and is thus well suited to problems where cost functions such as the mean squared error and cross entropy are prone to overpredicting the majority class. The benefit of the new measure is shown on a set of common class-imbalanced datasets using the Matthews Correlation Coefficient as an independent scoring measure.

Manuscript from author [PDF]

ES2018-78

Self-learning assembly systems during ramp-up

Ralf Schönherr, Maximilian Knaller, Markus Philipp

Abstract
Achieving the targeted production volume during the ramp-up phase plays an important role for the economic success of manufacturing companies. But ramp-up phases are usually characterized by a high degree of uncertainty, as many situations arise for the first time. These unexpected events lead to errors and faults in automated processes which cause losses in the overall production volume. This paper proposes an architecture for assembly systems to predict and avoid faults of the assembly process during ramp-up through self-learning. Different algorithms for self-learning components are evaluated. By using real production data sets, neural networks could be identified as the best solution.

Manuscript from author [PDF]

ES2018-87

Feasibility based Large Margin Nearest Neighbor metric learning

Babak Hosseini, Barbara Hammer

Abstract
Large margin nearest neighbor (LMNN) is a metric learner which optimizes the performance of the popular $k$NN classifier. However, its resulting metric relies on pre-selected target neighbors. In this paper, we address the feasibility of LMNN's optimization constraints regarding these target points, and introduce a mathematical measure to evaluate the size of the feasible region of the optimization problem. We enhance the optimization framework of LMNN by a weighting scheme which prefers data triplets which yield a larger feasible region. This increases the chances to obtain a good metric as the solution of LMNN's problem. We evaluate the performance of the resulting feasibility-based LMNN algorithm using synthetic and real datasets. The empirical results show an improved accuracy for different types of datasets in comparison to regular LMNN.

Manuscript from author [PDF]

ES2018-101

Combining latent tree modeling with a random forest-based approach, for genetic association studies

Christine Sinoquet, Kamel MEKHNACHA

Abstract
Association studies have been widely used to discover the genetic basis of complex phenotypes. However, standard univariate tests, and their alternatives, do not fully exploit the dependences between genetic markers. In this paper, we propose Sylva, a hybrid approach in which a random forest framework based on embedded trees benefits from a probabilistic graphical model. The latter is a collection of tree-shaped Bayesian networks with latent variables. We extensively compared Sylva and T-Trees, on simulated and real data. Sylva outperforms the already highly performant T-Trees, in a vast majority of cases.

Manuscript from author [PDF]

ES2018-63

Graph based neural networks for automatic classification of multiple sclerosis clinical courses

Francesco Calimeri, Aldo Marzullo, Claudio Stamile, Giorgio Terracina

Abstract
Automatic classification of biomedical imaging became an important field of research within the scientific community, in the latest years. Indeed, advances in image acquisition and processing techniques, along with the success of novel deep learning methods and architectures, represented a considerable support in providing better biomarkers for the characterization of several diseases, and brain diseases in particular. In this work we propose a novel neural network approach that is applied to graphs generated from MRI data in order to make predictions about the clinical status of a patient. Results show high performances in classification tasks and open interesting perspectives in the field.

Manuscript from author [PDF]

[Back to Top]


Regression and recommendation systems


ES2018-72

Extreme Minimal Learning Machine

Tommi Kärkkäinen

Abstract
Extreme Learning Machine (ELM) and Minimal Learning Machine (MLM) are nonlinear and scalable machine learning techniques with randomly generated basis. Both techniques share a step where a matrix of weights for the linear combination of the basis is recovered. In MLM, the kernel in this step corresponds to distance calculations between the training data and a set of reference points, whereas in ELM transformation with a sigmoidal activation function is most commonly used. MLM then needs additional interpolation step to estimate the actual distance-regression based output. A natural combination of these two techniques is proposed here, i.e., to use a distance-based kernel characteristic in MLM in ELM. The experimental results show promising potential of the proposed technique.

Manuscript from author [PDF]

ES2018-182

Learning with a Fisher surrogate loss in a small data regime

Moussab Djerrab, Alexandre Garcia,

Abstract
We introduce a novel framework, Output Fisher Embedding Regression (OFER), that makes use of a Fisher vector representation of the outputs and provides prediction by solving an appropriate pre-image problem. OFER takes advantage of the implicit structure of the marginal probability distribution of the output to improve performance in prediction. Although the proposed approach is general and versatile, we put a stress on the Gaussian mixture model for modelling the output data and design a closed-form solution for the corresponding pre-image problem. Numerical results are presented on a drug activity prediction task and a multi-class classification problem cast into a semantic regression problem and show the relevance of the approach in small data regime.

Manuscript from author [PDF]

ES2018-94

Fast Power system security analysis with Guided Dropout

Benjamin Donnot, Isabelle Guyon, Antoine MAROT, Marc Schoenauer, Patrick Panciatici

Abstract
We propose a new method to efficiently compute load-flows (the steady-state of the power-grid for given productions, consumptions and grid topology), substituting conventional simulators based on differential equation solvers. We use a deep feed-forward neural network trained with load-flows precomputed by simulation. Our architecture permits to train a network on so-called ``n-1'' problems, in which load flows are evaluated for every possible line disconnection, then generalize to ``n-2'' problems without re-training (a clear advantage because of the combinatorial nature of the problem). To that end, we developed a technique bearing similarity with ``dropout'', which we named ``guided dropout''.

Manuscript from author [PDF]

ES2018-51

Neural Networks for Implicit Feedback Datasets

Josef Feigl, Martin Bogdan

Abstract
Most users typically interact with products only through implicit feedback such as clicks or purchases rather than explicit user-provided information like product ratings. Learning to rank products according to individual preferences using only this implicit feedback can be helpful to make useful recommendations. In this paper, a neural network architecture to solve collaborative filtering problems for personalized rankings on implicit feedback datasets is presented. It is shown how a layer of constant weights forces the network to learn pairwise rankings. Additionally, similarities between the network and a matrix factorization model trained with Bayesian Personalized Ranking are proven. The experiments indicate state-of-the-art performance for the task of personalized ranking.

Manuscript from author [PDF]

ES2018-152

Regularize and explicit collaborative filtering with textual attention

Charles-Emmanuel Dias, Vincent Guigue, Patrick Gallinari

Abstract
Recommendation can be seen as tantamount to blind sentiment analysis, i.e. a sentiment prediction without text data. In that sense, we aim at encoding priors on users and items while reading their reviews, using a deep architecture with personalized attention modeling. Following this idea, we build an hybrid hierarchical sentiment classifier which is then used as a recommender system in inference.

Manuscript from author [PDF]

ES2018-183

Adaptive random forests for data stream regression

Heitor Murilo Gomes, Jean Paul Barddal, Luis Eduardo Boiko, Albert Bifet

Abstract
Data stream mining is a hot topic in the machine learning community that tackles the problem of learning and updating predictive models as new data becomes available over time. Even though several new methods are proposed every year, most focus on the classification task and overlook the regression task. In this paper, we propose an adaptation to the Adaptive Random Forest so that it can handle regression tasks, namely ARF-Reg. ARF-Reg is empirically evaluated and compared to existing works of the area, thus highlighting its applicability in different data stream scenarios.

Manuscript from author [PDF]

ES2018-33

Cache-efficient Gradient Descent Algorithm

imen chakroun, Tom Vander Aa, thomas ashby

Abstract
Best practice when using Stochastic Gradient Descent (SGD) suggests randomising the order of training points and streaming the whole set through the learner. This results in extremely low temporal locality of access to the training set and thus makes minimal use of the small, fast layers of memory in an HPC memory hierarchy. While mini-batch SGD is often used to control the noise on the gradient and make convergence smoother and more easy to identify than SGD, it suffers from the same extremely low temporal locality. In this paper we introduce Sliding Window SGD (SW-SGD) which uses temporal locality of training point access in an attempt to combine the advantages of SGD with mini batch-SGD by leveraging HPC memory hierarchies. We give initial results on a classification and a regression problems using the MNIST and CHEMBL datasets showing that memory hierarchies can be used to improve the performances of gradient algorithms.

Manuscript from author [PDF]

ES2018-73

Sensitivity analysis for predictive uncertainty

Stefan Depeweg, José Miguel Hernández-Lobato, Steffen Udluft, Thomas Runkler

Abstract
We derive a novel sensitivity analysis of input variables for predictive epistemic and aleatoric uncertainty. We use Bayesian neural networks with latent variables as a model class and illustrate the usefulness of our sensitivity analysis on real-world datasets. Our method increases the interpretability of complex black-box probabilistic models.

Manuscript from author [PDF]

ES2018-81

Revisiting FISTA for Lasso: Acceleration Strategies Over The Regularization Path

Alejandro Catalina, Carlos M. Alaíz, José R. Dorronsoro

Abstract
In this work we revisit FISTA algorithm for Lasso showing that recent acceleration techniques may greatly improve its basic version, resulting in a much more competitive procedure. We study the contribu- tion of the different improvement strategies, showing experimentally that the final version becomes much faster than the standard one.

Manuscript from author [PDF]

[Back to Top]


Shallow and Deep models for transfer learning and domain adaptation


ES2018-5

Shallow and Deep Models for Domain Adaptation problems

Siamak Mehrkanoon, Matthew Blaschko , Johan Suykens

Abstract
Manual labeling of sufficient training data for diverse application domains is a costly, laborious task and often prohibitive. Therefore, designing models that can leverage rich labeled data in one domain and be applicable to a different but related domain is highly desirable. In particular, domain adaptation or transfer learning algorithms seek to generalize a model trained in a source domain to a new target domain. Recent years has witnessed increasing interest in these types of models due to their practical importance in real-life applications. In this paper we provide a brief overview of recent techniques with both shallow and deep architectures for domain adaptation models.

Manuscript from author [PDF]

ES2018-145

Unsupervised domain adaptation of deep object detectors

Debjeet Majumdar, Vinay Namboodiri

Abstract
Domain adaptation has been understood and adopted in vision. Recently with the advent of deep learning there are a number of techniques that propose methods for deep learning based domain adaptation. However, the methods proposed have been used for adapting object classification techniques. In this paper, we solve for domain adaptation of object detection that is more commonly used. We adapt deep adaptation techniques for the Faster R-CNN framework. The techniques that we adapt are the recent techniques based on Gradient Reversal and Maximum Mean Discrepancy (MMD) reduction based techniques. Among them we show that the MK-MMD based method when used appropriately provides the best results. We analyze our model with standard real world settings by using Pascal VOC as source and MS-COCO as target and show a gain of 2.5 mAP at IoU of 0.5 over a source only trained model. We show that this improvement is statistically significant.

Manuscript from author [PDF]

[Back to Top]


Machine Learning and Data Analysis in Astroinformatics


ES2018-2

Machine learning and data analysis in astroinformatics

Michael Biehl, Kerstin Bunte, Giuseppe Longo, Peter Tino

Abstract
Astroinformatics is a new discipline at the cross-road of as- tronomy, advanced statistics and computer science. With next generation sky surveys, space missions and modern instrumentation astronomy will enter the Petascale regime raising the demand for advanced computer sci- ence techniques with hard- and software solutions for data management, analysis, efficient automation and knowledge discovery. This tutorial re- views important developments in astroinformatics over the past years and discusses some relevant research questions and concrete problems. The contribution ends with a short review of the special session papers in these proceedings, as well as perspectives and challenges for the near future.

Manuscript from author [PDF]

ES2018-125

Anomaly detection in star light curves using hierarchical Gaussian processes

Haoyan Chen, Tom Diethe, Niall Twomey, Peter Flach

Abstract
Here we examine astronomical time-series called light-curve data, which represent the brightness of celestial objects over a period of time. We focus specifically on the task of finding anomalies in three sets of light-curves of periodic variable stars. We employ a hierarchical Gaussian process to create a general and stable model of time series for anomaly detection, and apply this approach to the light curve problem. Hierarchical Gaussian processes require only a few additional parameters than Gaussian processes and incur negligible additional complexity. Additionally, the additional parameters are objectively optimised in a principled probabilistic framework. Experimentally, our approach outperforms several baselines and highlight several anomalous light curves in the datasets investigated.

Manuscript from author [PDF]

ES2018-130

Latent representations of transient candidates from an astronomical image difference pipeline using Variational Autoencoders

Pablo Huijse, Nicolas Astorga, Pablo Estevez, Giuliano Pignata

Abstract
The Chilean Automatic Supernovae SEarch (CHASE) is a survey designed to detect early Supernovae. In this paper we explore deep autoencoders to obtain a compressed latent space for a large transient candidate database from the CHASE image difference pipeline. Compared to conventional methods, the latent variables obtained with variational autoencoders preserve more information and are more discriminative towards real astronomical transients.

Manuscript from author [PDF]

ES2018-86

Globular Cluster Detection in the Gaia Survey

Mohammad Mohammadi, Reynier Peletier, Frank-Michael Schleif, Nicolai Petkov, Kerstin Bunte

Abstract
Existing algorithms for the detection of stellar structures in the Milky Way are most efficient when full phase-space and color information is available. This, however, is not often the case. Since recently, the Gaia satellite surveys the whole sky and is providing highly accurate positions for more than one billion sources. In this contribution we propose two independent strategies to find globular clusters in this database, based on magnitude distributions only. One approach is a nearest neighbor retrieval and the other an anomaly detection. Both techniques are able to find known globular clusters within our test frame consistently, as well as additional candidates for further investigation.

Manuscript from author [PDF]

ES2018-100

stellar formation rates in galaxies using machine learning models

Michele Delli Veneri, Stefano Cavuoti, Massimo Brescia, Giuseppe Riccio, Giuseppe Longo

Abstract
Global Stellar Formation Rates or SFRs are crucial to constrain theories of galaxy formation and evolution. SFR’s are usually estimated via spectroscopic observations which require too much previous telescope time and therefore cannot match the needs of modern precision cosmology. We therefore propose a novel method to estimate SFRs for large samples of galaxies using a variety of supervised ML models.

Manuscript from author [PDF]

ES2018-115

Prototype-based analysis of GAMA galaxy catalogue data

Aleke Nolte , Lingyu Wang, Michael Biehl

Abstract
We present a prototype-based machine learning analysis of labeled galaxy catalogue data containing parameters from the Galaxy and Mass Assembly (GAMA) survey. Using both an unsupervised and supervised method, the Self-Organizing Map and Generalized Relevance Matrix Learning Vec- tor Quantization, we find that the data does not fully support the popular visual-inspection-based galaxy classification scheme employed to categorize the galaxies. In particular, only one class, the Little Blue Spheroids, is consistently separable from the other classes. In a proof-of-concept experiment, we present the galaxy parameters that are most discriminative for this class.

Manuscript from author [PDF]

[Back to Top]


Deep Learning in Bioinformatics and Medicine


ES2018-1

Bioinformatics and medicine in the era of deep learning

Davide Bacciu, Paulo Lisboa, José D. Martín, Ruxandra Stoean, Alfredo Vellido

Abstract
Many of the current scientific advances in the life sciences have their origin in the intensive use of data for knowledge discovery. In no area this is so clear as in bioinformatics, led by technological breakthroughs in data acquisition technologies. It has been argued that bioinformatics could quickly become the field of research generating the largest data repositories, beating other data-intensive areas such as high-energy physics or astroinformatics. Over the last decade, deep learning has become a disruptive advance in machine learning, giving new live to the long-standing connectionist paradigm in artificial intelligence. Deep learning methods are ideally suited to large-scale data and, therefore, they should be ideally suited to knowledge discovery in bioinformatics and biomedicine at large. In this brief paper, we review key aspects of the application of deep learning in bioinformatics and medicine, drawing from the themes covered by the contributions to an ESANN 2018 special session devoted to this topic.

Manuscript from author [PDF]

ES2018-128

Controlling biological neural networks with deep reinforcement learning

Jan Wülfing, Sreedhar Saseendran Kumar, Joschka Boedecker, Martin Riedmiller, Ulrich Egert

Abstract
Targeted interaction with networks in the brain is of immense therapeutic relevance. The highly dynamic nature of neuronal networks and changes with progressive diseases create an urgent need for closed-loop control. Without adequate mathematical models of such complex networks, however, it remains unclear how tractable control problems can be formulated for neurobiological systems. Reinforcement learning (RL) could be a promising tool to address such challenges. Nevertheless, RL methods have rarely been applied to live, plastic neural networks. This study demonstrates that RL methods could help control response properties of biological neural networks with little prior knowledge of their complex dynamics.

Manuscript from author [PDF]

ES2018-14

Learning compressed representations of blood samples time series with missing data

Filippo Maria Bianchi, Karl Řyvind Mikalsen, Robert Jenssen

Abstract
Clinical measurements collected over time are naturally represented as multivariate time series (MTS), which often contain missing data. An autoencoder can learn low dimensional vectorial representations of MTS that preserve important data characteristics, but cannot deal explicitly with missing data. In this work, we propose a new framework that combines an autoencoder with the Time series Cluster Kernel (TCK), a kernel that accounts for missingness patterns in MTS. Via kernel alignment, we incorporate TCK in the autoencoder to improve the learned representations in presence of missing data. We consider a classification problem of MTS with missing values, representing blood samples of patients with surgical site infection. With our approach, rather than with a standard autoencoder, we learn representations in low dimensions that can be classified better.

Manuscript from author [PDF]

ES2018-59

Sleep staging with deep learning: a convolutional model

Isaac Fernández-Varela, Dimitrios Athanasakis, Samuel Parsons, Elena Hernández-Pereira, Vicente Moret-Bonillo

Abstract
Sleep staging is a crucial task in the context of sleep studies that involves the analysis of multiple signals, thus being a very tedious and complex task. Even for a trained expert, it can take several hours to annotate the signals recorded from a patient's sleep during a single night. To solve this problem several automatic methods have been developed, although most of them rely on hand engineered features. To address the inner problems of this approach, in this work we explore the possibility of solving this problem with a deep learning network that can self-learn the relevant features from the signals. Particularly, we propose a convolutional network, obtaining higher performance than in previous methods, achieving an average precision of 0.91, recall of 0.90, and F-1 score of 0.90.

Manuscript from author [PDF]

ES2018-82

Interpreting deep learning models for ordinal problems

José P. Amorim , Inęs Domingues, Pedro Henriques Abreu, Joăo Santos

Abstract
Machine learning algorithms have evolved by exchanging simplicity and interpretability for accuracy, which prevents their adoption in critical tasks such as healthcare. Progress can be made by improving interpretability of complex models while preserving performance. This work introduces an extension of interpretable mimic learning which teaches interpretable models to mimic predictions of complex deep neural networks, not only on binary problems but also in ordinal settings. The results show that the mimic models have comparative performance to Deep Neural Network models, with the advantage of being interpretable.

Manuscript from author [PDF]

ES2018-62

Non-negative Matrix Factorization for Medical Imaging

Miguel Atencia, Ruxandra Stoean

Abstract
A non-negative matrix factorization approach to dimensionality reduction is proposed to aid classification of images. The original images can be stored as lower-dimensional columns of a matrix that hold degrees of belonging to feature components, so they can be used in the training phase of the classification at lower runtime and without loss in accuracy. The extracted features can be visually examined and images reconstructed with limited error. The proof of concept is performed on a benchmark of handwritten digits, followed by the application to histopathological colorectal cancer slides. Results are encouraging, though dealing with real-world medical data raises a number of issues.

Manuscript from author [PDF]

ES2018-93

Multi-omics data integration using cross-modal neural networks

Ioana Bica, Petar Velickovic, Hui Xiao,

Abstract
Successful integration of multi-omics data for prediction tasks can bring significant advantages to precision medicine and to understanding molecular systems. This paper introduces a novel neural network architecture for exploring and integrating modalities in omics datasets, especially in scenarios with a limited number of training examples available. The proposed cross-modal neural network achieves up to 99% accuracy on omics datasets and it can be reliably used as a tool for performing inference. Moreover, we show how analysis of the weights and activations in the network can give us biological insights into understanding which genes are most relevant for the decision process and how different types of omics influence each other.

Manuscript from author [PDF]

ES2018-131

DEEP: decomposition feature enhancement procedure for graphs

Van Dinh Tran, Nicolň Navarin, Alessandro Sperduti

Abstract
When dealing with machine learning on graphs, one of the most successfully approaches is the one of kernel methods. Depending if one is interested in predicting properties of graphs (e.g. graph classification) or to predict properties of nodes in a single graph (e.g. graph node classification), different kernel functions should be adopted. In the last few years, several kernels for graphs have been defined in literature that extract local features from the input graphs, obtaining both efficiency and state-of-the-art predictive performances. Recently, some work has been done in this direction also regarding graph node kernels, but the majority of the graph node kernels available in literature consider only global information, that can be not optimal for many tasks. In this paper, we propose a procedure that allows to transform a local graph kernel in a kernel for nodes in a single, huge graph. We apply a specific instantiation to the task of disease gene prioritization from the bioinformatics domain, improving the state of the art in many diseases.

Manuscript from author [PDF]

ES2018-163

Deep Echo State Networks for Diagnosis of Parkinson's Disease

Claudio Gallicchio, Alessio Micheli, Luca Pedrelli

Abstract
In this paper, we introduce a novel approach for diagnosis of Parkinson's Disease (PD) based on deep Echo State Networks (ESNs). The identification of PD is performed by analyzing the whole time-series collected from a tablet device during the sketching of spiral tests, without the need for feature extraction and data preprocessing. We evaluated the proposed approach on a public dataset of spiral tests. The results of experimental analysis show that deepESNs perform significantly better than shallow ESN model. Overall, the proposed approach obtains state-of-the-art results in the identification of PD on this kind of temporal data.

Manuscript from author [PDF]

ES2018-180

Capturing variabilities from Computed Tomography images with Generative Adversarial Networks (GANs)

UMAIR JAVAID, John A. Lee

Abstract
With the advent of Deep Learning (DL) techniques, especially Generative Adversarial Networks (GANs), data augmentation and generation are quickly evolving domains that have raised much interest recently. However, the DL techniques are data demanding and since, medical data is not easily accessible, they suffer from the data insufficiency. To deal with this limitation, different data augmentation techniques are used. Here, we propose a novel unsupervised data-driven approach for data augmentation that can generate 2D Computed Tomography (CT) images using a simple GAN. The generated CT images have good global and local features of a real CT image and can be used to augment the training datasets for effective learning. In this proof-of-concept study, we show that our proposed solution using GANs is able to capture some of the global and local CT variabilities. Our network is able to generate visually realistic CT images and we aim to further enhance its output by scaling it to a higher resolution and potentially from 2D to 3D.

Manuscript from author [PDF]

ES2018-199

Pollen grain recognition using convolutional neural network

Natalia Khanzhina, Evgeny Putin, Andrey Filchenkov, Elena Zamyatina

Abstract
This paper addresses two problems: the automated pollen species recognition and counting them on an image obtained with a lighting microscope. Automation of pollen recognition is required in several domains, including allergy and asthma prevention in medicine and honey quality control in the nutrition industry. We propose a deep learning solution based on a convolutional neural network for classification, feature extraction and image segmentation. Our approach achieves state-of-the-art results in terms of accuracy. For 5 species, the approach provides 99.8% of accuracy, for 11 species - 95.9%.

Manuscript from author [PDF]

[Back to Top]


Randomized Neural Networks


ES2018-6

Randomized Recurrent Neural Networks

Claudio Gallicchio, Alessio Micheli, Peter Tino

Abstract

Manuscript from author [PDF]

ES2018-49

Bidirectional deep-readout echo state networks

Filippo Maria Bianchi, Simone Scardapane, Sigurd Lřkse, Robert Jenssen

Abstract
We propose a deep architecture for the classification of multivariate time series. By means of a recurrent and untrained reservoir we generate a vectorial representation that embeds temporal relationships in the data. To improve the memorization capability, we implement a bidirectional reservoir, whose last state captures also past dependencies in the input. We apply dimensionality reduction to the final reservoir states to obtain compressed fixed size representations of the time series. These are subsequently fed into a deep feedforward network trained to perform the final classification. We test our architecture on benchmark datasets and on a real-world use-case of blood samples classification. Results show that our method performs better than a standard echo state network and, at the same time, achieves results comparable to a fully-trained recurrent network, but with a faster training.

Manuscript from author [PDF]

ES2018-105

Forecasting Business Failure in Highly Imbalanced Distribution based on Delay Line Reservoir

Ali Rodan, Pedro A. Castillo, Hossam Faris, , A.M. Mora, Huthaifa Jawazneh

Abstract
Bankruptcy is a critical financial problem that affects a high number of companies around the world. Thus, in recent years an increasing number of researchers have tried to solve it by applying different machine-learning models as powerful tools for the different economical agents related to the company. In this work, we propose the use of a simple deterministic delay line reservoir (DLR) state space by combining it with three popular classification algorithms (J48, k-NN, and MLP) as an efficient and accurate solution to the bankruptcy prediction problem. The proposed approach is evaluated on a real world dataset collected from Spanish companies. Obtained results show that the proposed models have a higher predictive ability than traditional classification approaches (without DLR reservoir state), resulting in a suitable and efficient alternative approach to solve this complex problem.

Manuscript from author [PDF]

ES2018-172

Estimation of the Human Concentration using Echo State Networks

Hikmat Dashdamirov, Sebastián Basterrech

Abstract
We introduce a very simple and portable device for estimating the human concentration. We developed a Brain-Computer Interface system based on EEG signals which is able to produce highly accurate prediction of the human activities. There are two types of mental activities, one requires high concentration and another one requires relaxation. We show that it is possible to estimate the human concentration with few brain signals. The classification problem is solved using Neural Networks. In particular, we obtain a very accurate classifier using the fast and robust Echo State Network method.

Manuscript from author [PDF]

ES2018-176

Quantifying the Reservoir Quality using Dimensionality Reduction Techniques

Tomas Burianek, Sebastián Basterrech

Abstract
Echo State Network is a particular type of Recurrent Neural Networks that combines principles from kernels, linear regression and dynamical systems. The neural network has a random initialized hidden-hidden weights (reservoir) that keeps fixed during the training. The reservoir projects the input patterns onto a feature map. Here, we present a correlation analysis between the input space and the feature map. We use a dimensionality reduction technique (Sammon Mapping) for representing the input space. We show a correlation between the Sammon energy and the model accuracy, which can be useful for defining good reservoir topologies.

Manuscript from author [PDF]

[Back to Top]


Clustering and feature selection


ES2018-134

Scalable robust clustering method for large and sparse data

Joonas Hämäläinen, Tommi Kärkkäinen, Tuomo Rossi

Abstract
Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. More precisely, a general algorithm is described and the accuracy and scalability of a distributed implementation of the algorithm is tested. The obtained results allow us to conclude the viability of the proposed approach.

Manuscript from author [PDF]

ES2018-22

clustering with decision trees: divisive and agglomerative approach

Lauriane Castin, Benoît Frénay

Abstract
Decision trees are mainly used to perform classification tasks. Samples are submitted to a test in each node of the tree and guided through the tree based on the result. Decision trees can also be used to perform clustering, with a few adjustments. On one hand, new split criteria must be discovered to construct the tree without the knowledge of samples labels. On the other hand, new algorithms must be applied to merge sub-clusters at leaf nodes into actual clusters. In this paper, new split criteria and agglomeration algorithms are developed for clustering, with results comparable to other existing clustering techniques.

Manuscript from author [PDF]

ES2018-16

Comparison of cluster validation indices with missing data

Marko Niemelä, Sami Äyrämö, Tommi Kärkkäinen

Abstract
Clustering is an unsupervised machine learning technique, which aims to divide a given set of data into subsets. The number of hidden groups in cluster analysis is not always obvious and, for this purpose, various cluster validation indices have been suggested. Recently some studies reviewing validation indices have been provided, but any experiments against missing data are not yet available. In this paper, performance of ten well-known indices on ten synthetic data sets with various ratios of missing values is measured using squared euclidean and city block distances based clustering. The original indices are modified for a city block distance in novel way. Experiments illustrate different degree of stability for the indices with respect to the missing data.

Manuscript from author [PDF]

ES2018-122

Efficient approximate representations for computationally expensive features

Raul Santos-Rodriguez, Niall Twomey

Abstract
High computational complexity is often a barrier to achieving desired representations in resource-constrained settings. This paper introduces a simple and computationally cheap method of approximating complex features. We do so by carefully constraining the architecture of a neural network and regress from raw data to the desired feature representation. Our analysis focuses on spectral features, and demonstrate how low-capacity networks can capture the end-to-end dynamics of cascaded composite functions. Not only do approximating neural networks simplify the analysis pipeline, but our approach produces feature representations up to 20 times more quickly. Excellent feature fidelity is achieved in our experimental analysis with feature approximations, but we also report nearly indistinguishable predictive performance when comparing between exact and approximate representations.

Manuscript from author [PDF]

ES2018-167

Regularised maximum-likelihood inference of mixture of experts for regression and clustering

Bao Tuyen Huynh, Faicel Chamroukhi

Abstract
Variable selection is fundamental to high-dimensional statistical modeling, and is challenging in particular in unsupervised modeling, including mixture models. We propose a regularised maximum-likelihood inference of the Mixture of Experts model which is able to deal with potentially correlated features and encourages sparse models in a potentially high-dimensional scenarios. We develop a hybrid Expectation-Majorization-Maximization (EM/MM) algorithm for model fitting. Unlike state-of-the art regularised ML inference [1,2], the proposed modeling doesn't require an approximate of the regularisation. The proposed algorithm allows to automatically obtain sparse solutions without thresholding, and includes coordinate descent updates avoiding matrix inversion. An experimental study shows the capability of the algorithm to retrieve sparse solutions and for model fitting in model-based clustering of regression data.

Manuscript from author [PDF]

ES2018-179

Feature selection for label ranking

Noelia Sánchez-Marońo, Beatriz Pérez-Sánchez

Abstract
Over the last years, feature selection and label ranking have attracted considerable attention in Artificial Intelligence research. Feature selection has been applied to many machine learning problems with excellent results. However, studies about its combination with label ranking are undeveloped. This paper presents a novelty work that uses feature selection filters as a preprocessing step for label ranking. Experimental results show a significant reduction, up to 33%, in the number of features used for the label ranking problems whereas the performance results are competitive in terms of similarity measure.

Manuscript from author [PDF]

ES2018-57

A novel filter algorithm for unsupervised feature selection based on a space filling measure

Mohamed Laib, Mikhaďl Kanevski

Abstract
The research proposes a novel filter algorithm for the unsupervised feature selection problems based on a space filling measure. A well-known criterion of space filling design, called the coverage measure, is adapted to dimensionality reduction problems. Originally, this measure was developed to judge the quality of a space filling design. In this work it is used to reduce the redundancy in data. The proposed algorithm is evaluated on simulated data with several scenarios of noise injection. Furthermore, a comparison with some benchmark methods of feature selection is performed on real UCI datasets.

Manuscript from author [PDF]

[Back to Top]


Mathematical aspects of learning, and reinforcement learning


ES2018-45

Asymptotic statistics for multilayer perceptron with ReLu hidden units

joseph Rynkiewicz

Abstract
We consider regression models involving multilayer perceptrons (MLP) with rectified linear unit (ReLu) functions for hidden units. It is a difficult task to study statistical properties of such models for several reasons: A first difficulty is that these activation functions are not differentiable everywhere, a second reason is also that in practice these models may be heavily overparametrized. In general, the estimation of the parameters of the MLP is done by minimizing a cost function, we focus here on the sum of square errors (SSE) which is the standard cost function for regression purpose. In this framework, we can characterize the asymptotic behavior of the SSE of estimated models which give information on the possible overfitting of such models. This task is done using recent methodology introduced to deal with models with a loss of identifiability which is very flexible. So, we don't have to assume that a true model exits or that a finite set of parameters realize the best regression function.

Manuscript from author [PDF]

ES2018-139

Local Rademacher Complexity Machine

Luca Oneto, Sandro Ridella, Davide Anguita

Abstract
In this paper we present the Local Rademacher Complexity Machine, a transposition of the Local Rademacher Complexity Theory into a learning algorithm. By exploiting a series of real world small-sample datasets, we show the advantages of our proposal with respect to the Support Vector Machines, i.e. the transposition of the milestone results of V. N. Vapnik and A. Chervonenkis into a learning algorithm.

Manuscript from author [PDF]

ES2018-174

A sharper bound on the Rademacher complexity of margin multi-category classifiers

Khadija Musayeva, Fabien Lauer, Yann Guermeur

Abstract
One of the main open problems in the theory of margin multi-category pattern classification is the dependency of a guaranteed risk on the number C of categories, the sample size m and the margin parameter gamma. This paper derives a new bound on the probability of error of margin multi-category classifiers under minimal learnability assumptions. It improves the dependency on C over the state of the art. This is achieved through the introduction of a new Sauer-Shelah lemma.

Manuscript from author [PDF]

ES2018-123

Slowness-based neural visuomotor control with an Intrinsically motivated Continuous Actor-Critic

Muhammad Burhan Hafez, Matthias Kerzel, Cornelius Weber, Stefan Wermter

Abstract
In this paper, we present a new visually guided exploration approach for autonomous learning of visuomotor skills. Our approach uses hierarchical Slow Feature Analysis for unsupervised learning of efficient state representation and an Intrinsically motivated Continuous Actor-Critic learner for neuro-optimal control. The system learns online an ensemble of local forward models and generates an intrinsic reward based on the learning progress of each learned forward model. Combined with the external reward, the intrinsic reward guides the system’s exploration strategy. We evaluate the approach for the task of learning to reach an object using raw pixel data in a realistic robot simulator. The results show that the control policies learned with our approach are significantly better both in terms of length and average reward than those learned with any of the baseline algorithms.

Manuscript from author [PDF]

ES2018-177

A variable projection method for block term decomposition of higher-order tensors

Guillaume Olikier, Pierre-Antoine Absil, Lieven De Lathauwer

Abstract
Higher-order tensors have become popular in many areas of applied mathematics such as statistics, scientific computing, signal processing or machine learning, notably thanks to the many possible ways of decomposing a tensor. In this paper, we focus on the best approximation in the least-squares sense of a higher-order tensor by a block term decomposition. Using variable projection, we express the tensor approximation problem as a minimization of a cost function on a Cartesian product of Stiefel manifolds. We present numerical experiments where variable projection makes a steepest-descent method approximately twice faster.

Manuscript from author [PDF]

ES2018-50

Reinforcement Learning for High-Frequency Market Making

Ye-Sheen Lim, Denise Gorse

Abstract
In this paper we present the first practical application of reinforcement learning to optimal market making in high-frequency trading. States, actions, and reward formulations unique to high-frequency market making are proposed, including a novel use of the CARA utility as a terminal reward for improving learning. We show that the optimal policy trained using Q-learning outperforms state-of-the-art market making algorithms. Finally, we analyse the optimal reinforcement learning policies, and the influence of the CARA utility from a trading perspective.

Manuscript from author [PDF]

[Back to Top]


Emerging trends in machine learning: beyond conventional methods and data


ES2018-4

Emerging trends in machine learning: beyond conventional methods and data

Luca Oneto, Nicolň Navarin, Michele Donini, Davide Anguita

Abstract
Recently, new promising theoretical results, techniques, and methodologies have attracted the attention of many researchers and have allowed to broaden the range of applications in which machine learning can be effectively applied in order to extract useful and actionable information from the huge amount of heterogeneous data produced everyday by an increasingly digital world. Examples of these methods and problems are: learning under privacy and anonymity constraints, learning from structured, semi-structured, multi-modal (heterogeneous) data, constructive machine learning, reliable machine learning, learning to learn, mixing deep and structured learning, semantics-enabled recommender systems, reproducibility and interpretability in machine learning, human-in-the-loop, adversarial learning. The focus of this special session is to attract both solid contributions or preliminary results which show the potentiality and the limitations of new ideas, refinements, or contaminations between the different fields of machine learning and other fields of research in solving real world problems. Both theoretical and practical results are welcome to our special session.

Manuscript from author [PDF]

ES2018-89

Finding the most interpretable MDS rotation for sparse linear models based on external features

Adrien Bibal, Rebecca Marion, Benoît Frénay

Abstract
One approach to interpreting multidimensional scaling (MDS) embeddings is to estimate a linear relationship between the MDS dimensions and a set of external features. However, because MDS only preserves distances between instances, the MDS embedding is invariant to rotation. As a result, the weights characterizing this linear relationship are arbitrary and difficult to interpret. This paper proposes a procedure for selecting the most pertinent rotation for interpreting a 2D MDS embedding.

Manuscript from author [PDF]

ES2018-112

Mixture of Hidden Markov Model as Tree Encoder

Davide Bacciu, Daniele Castellana

Abstract
The paper introduces a new probabilistic tree encoder based on a mixture of Bottom-up Hidden Markov Tree Models. The ability to recognise similar structures in data is experimentally assessed both in clusterization and classification tasks. The results obtained on this preliminary experiment suggests that this model can be used successfully to compress the tree information content in a fixed representation.

Manuscript from author [PDF]

ES2018-143

Set point thresholds from topological data analysis and an outlier detector

Alessio Carrega

Abstract
We provide an algorithm for unsupervised or semi-supervised learning to determine, once the input settings are given, a very easily described zone of optimal execution settings for a production. A region is very easily described if anyone can determine whether a point is inside it and select a point on it with a certain range of choice. This can be applied both in production optimization and in predictive maintenance. Part of the method is based on a topological data analysis tool: Mapper. We also provide a method to detect outliers on new data.

Manuscript from author [PDF]

ES2018-119

Differential private relevance learning

Johannes Brinkrolf, Kolja Berger, Barbara Hammer

Abstract
Digital information is collected daily in growing volumes. Mutual benefits drive the demand for the exchange and publication of data among parties. However, it is often unclear how to handle these data properly in the case that the data contains sensitive information. Differential privacy has become a powerful principle for privacy-preserving data analysis tasks in the last few years, since it entails a formal privacy guarantee for such settings. This is obtained by a separation of the utility of the database and the risk of an individual to lose his/her privacy. In this contribution, we introduce the Laplace mechanism and a stochastic gradient descent methodology which guarantee differential privacy. Then, we show how these paradigms can be incorporated into two popular machine learning algorithm, namely GLVQ and GMLVQ. We demonstrate the results of privacy-preserving LVQ based on three benchmarks.

Manuscript from author [PDF]

ES2018-124

On aggregation in ranking median regression

Stéphan Clémençon, Anna Korba

Abstract
In the present era of personalized customer services and recommender systems, predicting the preferences of an individual/user over a set of items indexed by $\n=\{1,\; \ldots,\; n\}$, $n\geq 1$, based on its characteristics, modelled as a r.v. $X$ say, is an ubiquitous issue. Though easy to state, this predictive problem referered to as \textit{ranking median regression} (RMR in short) is very difficult to solve in practice. The major challenge lies in the fact that, here, the (discrete) output space is the symmetric group $\mathfrak{S}_n$, composed of all permutations of $\n$, of explosive cardinality $n!$, and which is not a subset of a vector space. It is thus far from straightforward to build predictive rules taking their values in $\mathfrak{S}_n$, except by means of ranking aggregation techniques implemented at a local level, as proposed in \cite{YWL10} or \cite{CKS17bis}. However, such local learning techniques exhibit high instability and it is the main goal of this paper to investigate to which extent Kemeny ranking aggregation of randomized RMR rules may remedy this drawback. Beyond a theoretical analysis establishing its validity, the relevance of this novel ensemble learning technique is supported by experimental results.

Manuscript from author [PDF]

ES2018-202

Temporal transfer learning for drift adaptation

Daegun Won, Peter Jansen, Jaime Carbonell

Abstract
Whereas detecting and adapting to concept drift has been well studied, predicting temporal drift of decision boundaries has received much less attention. This paper proposes a method for drift prediction, drift projection, and active-learning for adjusting the projected decision boundary so as to regain accuracy with minimal additional labeled samples. The method works with different underlying learning algorithms. Results on several data sets with translational and rotational drift and corresponding boundary projection show regained accuracy with significantly fewer labeled samples, even in the presence of noisy drift.

Manuscript from author [PDF]

ES2018-140

LANN-DSVD: A privacy-preserving distributed algorithm for machine learning

Oscar Fontenla-Romero, Bertha Guijarro-Berdińas, Beatriz Pérez-Sánchez, Marcelo Gómez-Casal

Abstract
In the Big Data era new challenges have arisen for the machine learning field related with the Volume (a high number of samples or variables), the Velocity, etc. making many of the classic and brilliant methods not applicable anymore. One of these concerns derives from with Privacy issues when data is distributed and it cannot be shared. In this paper we present the LANN-DSVD algorithm a non iterative method for One-Layer Neural Networks that allows distributed learning guaranteeing privacy among locations. Moreover, it is non iterative, parameter-free and provides incremental learning, thus making it very suitable to manage huge and/or continuous data. Results demonstrate its competitiveness both in efficiency and efficacy.

Manuscript from author [PDF]

ES2018-192

Vector Field Based Neural Networks

Daniel Vieira, Fabio Rangel, Fabrício Firmino, Joao Paixao

Abstract
A novel Neural Network architecture is proposed using the mathematically and physically rich idea of vector fields as hidden layers to perform nonlinear transformations in the data. The data points are interpreted as particles moving along a flow defined by the vector field which intuitively represents the desired movement to enable classification. The architecture moves the data points from their original configuration to a new one following the streamlines of the vector field with the objective of achieving a final configuration where classes are separable. An optimization problem is solved through gradient descent to learn this vector field.

Manuscript from author [PDF]

[Back to Top]


Temporal data, sequences and incremental learning


ES2018-156

Non-Negative Tensor Dictionary Learning

Abraham Traoré, Maxime Berar, Alain Rakotomamonjy

Abstract
A challenge faced by dictionary learning and non-negative ma- trix factorization is to eciently model, in a context of feature learning, temporal patterns for data presenting sequential (two-dimensional) structure such as spectrograms. In this paper, we address this issue through tensor factorization. For this purpose, we make clear the connection between dictionary learning and tensor factorization when several examples are available. From this connection, we derive a novel (supervised) learning problem which induces emergence of temporal patterns in the learned dictionary. Obtained features are compared in a classi cation framework with those obtained by NMF and achieve promising results.

Manuscript from author [PDF]

ES2018-157

An extension of nonstationary fuzzy sets to heteroskedastic fuzzy time series

Marcos Antonio Alves, Petrônio Cândido de Lima e Silva, Carlos Alberto Severiano Junior, Gustavo Linhares Vieira, Frederico Gadelha Guimarăes, Hossein Javedani Sadaei

Abstract
Most applications deal with unconditional variance of the time series. Fuzzy time series allow an inexpensive computation to forecasting dynamic processes and uncertainties. In this paper we have extended the concept of nonstationary fuzzy sets to Fuzzy Time Series, termed Nonstationary Fuzzy Time Series (NSFTS). While some models require new data before adapting, the NSFTS is capable of adapting to heteroskedastic time series. In the experiments, NSFTS outperformed other known FTS methods with box-cox transformations available. Statistical tests in three different datasets indicate that the results achieved by the proposed model are either superior or non-inferior to other FTS models.

Manuscript from author [PDF]

ES2018-120

Surprisal-based activation in recurrent neural networks

Tayfun Alpay, Fares Abawi, Stefan Wermter

Abstract
Learning hierarchical abstractions from sequences is a challenging and open problem for Recurrent Neural Networks (RNNs). This is mainly due to the difficulty of detecting features that span over long distances with also different frequencies. In this paper, we address this challenge by introducing surprisal-based activation, a novel method to preserve activations contingent on encoding-based self-information. The preserved activations can be considered as temporal shortcuts with perfect memory. We evaluate surprisal-based activation on language modelling by testing it on the Penn Treebank corpus and find that it can improve performance when compared to a baseline RNN.

Manuscript from author [PDF]

ES2018-116

K-spectral centroid: extension and optimizations

Brieuc Conan-Guez, Alain Gély, Lydia Boudjeloud-Assala, Alexandre Blansché

Abstract
In this work, we address the problem of unsupervised classification of large time series datasets. We focus on K-Spectral Centroid (KSC), a k-means-like model, devised for time series clustering. KSC relies on a custom dissimilarity measure between time series, which is invariant to time shifting and Y-scaling. KSC has two downsides: firstly its dissimilarity measure only makes sense for non negative time series. Secondly the KSC algorithm is relatively demanding in terms of computation time. In this paper, we present a natural extension of the KSC dissimilarity measure to time series of arbitrary signs. We show that this new measure is a metric distance. We propose to speed up this extended KSC (EKSC) thanks to four exact optimizations. Finally, we compare EKSC to a similar model, K-Shape, on real world datasets.

Manuscript from author [PDF]

ES2018-133

Temporal modeling of ALS using longitudinal data and long-short term memory-based algorithm

Aviv Nahon, Boaz Lerner

Abstract
ALS is a neurodegenerative disease where factors such as disease progression rate and pattern vary greatly among patients. Since patient functionality deteriorates over time, we model ALS temporally to mimic the physician's reasoning by incorporating old with new information using a long-short term memory (LSTM) network. We demonstrate that the LSTM achieves a higher accuracy than a random forest in disease state prediction, and improves accuracy with data from additional clinic visits. Being an anytime predictor, our model can help physicians and caregivers to adjust patients' treatment and living environment along the disease period, improving patients' life quality.

Manuscript from author [PDF]

ES2018-190

Person Identification and Discovery With Wrist Worn Accelerometer Data

Ryan McConville, Raul Santos-Rodriguez, Niall Twomey

Abstract
Internet of Things devices with embedded accelerometers continue to grow in popularity. These are often attached to individuals, whether they are a mobile phone in a pocket or a wrist-worn smartwatch, capturing data of a personal nature. In this work we propose a method for person identification using accelerometer data via supervised machine learning techniques. Further, we introduce the first unsupervised method for discovering individuals using the same accelerometer data. We report high performance both in terms of classification and clustering using a publicly available dataset covering a large number of activities of daily living. While this has numerous benefits in tasks such as activity recognition, this work also motivates the debate and discussion around privacy concerns of the analysis of accelerometer data.

Manuscript from author [PDF]

ES2018-203

CDTW-based classification for Parkinson's Disease diagnosis

Nicolas KHOURY , Ferhat ATTAL, Yacine Amirat, Abdelghani CHIBANI, Samer Mohammed

Abstract
This paper presents a new classification approach for Parkinson's Disease (PD) diagnosis using Continuous Dynamic Time Warping (CDTW) technique and gait cycles data. These data are the vertical Ground Reaction Forces (vGRFs) recordings collected from eight force sensors placed in each shoe sole worn by each subject. The proposed approach exploits the principle of the repetition of gait cycle patterns to discriminate healthy subjects from PD subjects. The repetition of gait cycles is evaluated using the similarity of the time-series corresponding to stance phases estimated by applying the CDTW technique. The CDTW distances, extracted from gait cycles, are used as inputs of a binary classifier discriminating healthy subjects from PD subjects. Different classification methods are evaluated, including four supervised methods: K-Nearest Neighbours (K-NN), Decision Tree (DT), Random Forest (RF), and Support Vector Machines (SVM), and two unsupervised ones: Gaussian Mixture Model (GMM), and K-means. The proposed approach compares favorably with a classification based on standard features.

Manuscript from author [PDF]

ES2018-48

Personalizing human activity recognition models using incremental learning

Pekka Siirtola, Heli Koskimäki, Juha Röning

Abstract
In this study, the aim is to personalize inertial sensor data-based human activity recognition models using incremental learning. At first, the recognition is based on user-independent model. However, when personal streaming data becomes available, the incremental learning-based recognition model can be updated, and therefore personalized, based on the data without user-interruption. The used incremental learning algorithm is Learn++ which is an ensemble method that can use any classifier as base classifier. In fact, study compares three different base classifiers: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and classification and regression tree (CART). Experiments are based on publicly open data set and they show that already a small personal training data set can improve the classification accuracy. Improvement using LDA as base classifier is 4.6 percentage units, using QDA 2.0 percentage units, and 2.3 percentage units using CART. However, if the user-independent model used in the first phase of the recognition process is not accurate enough, personalization cannot improve recognition accuracy.

Manuscript from author [PDF]

ES2018-107

Short-term Memory of Deep RNN

Claudio Gallicchio

Abstract
The extension of deep learning towards temporal data processing is gaining an increasing research interest. In this paper we investigate the properties of state dynamics developed in successive levels of deep recurrent neural networks (RNNs) in terms of short-term memory abilities. Our results reveal interesting insights that shed light on the nature of layering as a factor of RNN design. Noticeably, higher layers in a hierarchically organized RNN architecture results to be inherently biased towards longer memory spans even prior to training of the recurrent connections. Moreover, in the context of Reservoir Computing framework, our analysis also points out the benefit of a layered recurrent organization as an efficient approach to improve the memory skills of reservoir models.

Manuscript from author [PDF]

ES2018-104

Effect of context in swipe gesture-based continuous authentication on smartphones

Pekka Siirtola, Jukka Komulainen, Vili Kellokumpu

Abstract
This work investigates how context should be taken into account when conducting continuous authentication of a smartphone user based on touchscreen and accelerometer readings from swipe gestures. The study is based on publicly open data set consisting of 100 study subjects performing pre-defined reading and navigation tasks while sitting or walking. It is shown that context-specific models are needed for different smartphone usage and human activity scenarios to minimize authentication error. Also, the experimental results suggests that utilization of phone movement improves swipe gesture-based verification performance only when the user is moving.

Manuscript from author [PDF]

[Back to Top]


Impact of Biases in Big Data


ES2018-7

Impact of Biases in Big Data

Patrick Glauner, Petko Valtchev, Radu State

Abstract
The underlying paradigm of big data-driven machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. Is having simply more data always helpful? In 1936, The Literary Digest collected 2.3M filled in questionnaires to predict the outcome of that year's US presidential election. The outcome of this big data prediction proved to be entirely wrong, whereas George Gallup only needed 3K handpicked people to make an accurate prediction. Generally, biases occur in machine learning whenever the distributions of training set and test set are different. In this work, we provide a review of different sorts of biases in (big) data sets in machine learning. We provide definitions and discussions of the most commonly appearing biases in machine learning: class imbalance and covariate shift. We also show how these biases can be quantified and corrected. This work is an introductory text for both researchers and practitioners to become more aware of this topic and thus to derive more reliable models for their learning problems.

Manuscript from author [PDF]

ES2018-58

Analysis of imputation bias for feature selection with missing data

Borja Seijo-Pardo, Amparo Alonso-Betanzos, Kristin Bennett, Veronica Bolon-Canedo, Isabelle Guyon, Julie Josse, Mehreen Saeed

Abstract
We study risk/benefit tradeoff of missing value imputation in the context of feature selection. We caution against using imputation methods that may yield false positives: features not associated to the target becoming dependent as a result of imputation. We also investigate situations in which imputing missing values may be beneficial to reduce false negatives. We use causal graphs to characterize when structural bias arises and introduce a de-biased version of the t-test.

Manuscript from author [PDF]

ES2018-99

Systematics aware learning : a case study in high energy physics

Victor Estrade, Cecile Germain, Isabelle Guyon, David Rousseau

Abstract
Experimental science often has to cope with systematic errors that coherently bias data. We analyze this issue on the analysis of data produced by experiments of the Large Hadron Collider at CERN as a case of supervised domain adaptation. Systematics-aware learning should create an efficient representation that is insensitive to perturbations induced by the systematic effects. We present an experimental comparison of the adversarial knowledge-free approach and a less data-intensive alternative.

Manuscript from author [PDF]

[Back to Top]


Optimization and metaheuristics


ES2018-149

Evolutionary RL for Container Loading

Sarmimala Saikia, Richa Verma, Puneet Agarwal, Gautam Shroff, Lovekesh Vig, Ashwin Srinivasan

Abstract
Loading the containers on the ship from a yard, is an important part of port operations. Finding the optimal sequence for loading of containers, is known to be computationally hard and is an example of combinatorial optimization, leading to usage of simple heuristics in practice. In this paper, we propose an approach which uses a mix of Evolutionary Strategies and Reinforcement Learning (RL) technique to find an approximation of the optimal solution. The RL based agent uses Policy Gradient method, an evolutionary reward strategy and a Pool of good (not-optimal) solutions to find the approximation. We find that the RL agent learns near-optimal solutions that outperforms the heuristic solutions. We also observe that the RL agent assisted with a pool generalizes better for unseen problems than an RL agent without a pool. We present our results on synthetic data as well as real-world data taken from container terminal. The results validate that our approach does comparatively better than the heuristics solutions available, and adapts to unseen problems better.

Manuscript from author [PDF]

ES2018-175

Enhancement of a stochastic Markov-blanket framework with ant colony optimization, to uncover epistasis in genetic association studies

Christine Sinoquet, Clément Niel

Abstract
In association genetics, many studies rely on univariate statistical tests to reveal genotype-phenotype relationships, and are thus prone to miss the situations of epistasis (interaction between genes). We designed SMMB (Multiple Stochastic Markov blankets), and SMMB-ACO, a variant combined with ant colony optimization, to detect epistasis. We compare our proposals with three other methods. SMMB-ACO outperforms the other methods for 50% of simulated datasets. On real datasets, the detection ability of SMMB-ACO is close to that of the best approach, which is a slow method, and SMMB-ACO is the fastest algorithm behind a much less performing method.

Manuscript from author [PDF]

ES2018-35

Meerkats-inspired Algorithm for Global Optimization Problems

Carlos Eduardo Klein, Leandro dos Santos Coelho

Abstract
Bio-inspired computing has been a relevant topic in scientific, computing and engineering fields in recent years. Most bio-inspired metaheuristics model a specific phenomenon or mechanism based on which they tackle optimization problems. This paper introduced the meerkats-inspired algorithm (MEA) a novel population-based swarm intelligence algorithm for global optimization in the continuous domain. The performance of MEA is showcased on six classical constrained engineering problems from literature. Numerical results and comparisons with other state of the art stochastic algorithms are also provided. Results analysis reveal that the MEA produced consistent results when compared with other optimizers.

Manuscript from author [PDF]

ES2018-18

Cheetah Based Optimization Algorithm: A Novel Swarm Intelligence Paradigm

Carlos Eduardo Klein, Viviana Cocco Mariani, Leandro dos Santos Coelho

Abstract
All the new gadgets, systems and advances in technology are bringing the actual engineers problems with increasing complexity. To solve those problems, the optimization algorithms are popping up to support and even improve the actual scenario. Several stochastic optimization paradigms called metaheuristics are being proposed each year and the inspiration comes from animals, plants, experiments, chemical processes or simply math. In this paper, a cheetah based optimization algorithm (CBA) is proposed, capturing the social behavior from those animals. The proposed CBA is validated against seven known optimizers using three different benchmark problems. Finally, some considerations about research issues and directions in the CBA design are given.

Manuscript from author [PDF]

ES2018-77

Evolutionary Composition of Customized Fault Localization Heuristics

Diogo de-Freitas, Plinio Leitao-Junior, Celso Camilo-Junior, Rachel Harrison

Abstract
Fault localisation is one of the most difficult and costly parts in software debugging. Researchers have tried to automate this process by formulating measures for assessment of code elements suspiciousness. This paper reports an evolutionary-based approach to combine non-linearly 34 previous measures to formulate a new program oriented fault localisation heuristic. The method was evaluated with 107 single-bug programs and compared against 35 approaches -- 34 spectrum-based heuristics and a previous evolutionary linear combination approach. The experiments have shown that the proposal consistently achieved competitive results related to the others according to several effectiveness metrics.

Manuscript from author [PDF]

ES2018-90

Order Crossover for the Inventory Routing Problem

Mohamed Salim Amri Sakhri, Mounira Tlili, Hamid Allaoui, Ouajdi Korbaa

Abstract
In this paper, we aim to find a solution that reduces the logistical activity costs by using new hybrid meta-heuristics. We develop, in this work, a genetic algorithm (GA) with a hybrid crossing operator. The operator considered is the Order Crossover (OX); we will test our hybrid algorithm in a Periodic Inventory Routing Problem (PIRP). Our study proves the performance of the hybrid operator OX compared with the classic GA, demonstrate the competitiveness of this innovative approach to solve the large-scale instances and bring a better quality of the solution.

Manuscript from author [PDF]

[Back to Top]