Foto: Thor Balkhed

Maskininlärning (eng. Machine Learning) är ett vetenskapligt område i gränslandet mellan statistik, artificiell intelligens och datavetenskap.

Seminarierna hålls vanligtvis var fjärde onsdag kl. 15.15 - 16:15 i Ada Lovelace (Visionen), hus B, Campus Valla, Linköping.

**Spring 2020**

**Wednesday, February 26, 3.15 pm, 2020**

**TBA
Nils Bore**, Dept of Robotics, Perception, and Learning, Royal Institute of Technology (KTH)

**Abstract:**TBA

**Location**: TBA

**Organizer:**Per Sidèn

**Wednesday, March 25, 3.15 pm, 2020**

**TBA
TBA
Abstract:** TBA

**Location:**TBA

**Organizer:**TBA

**Wednesday, April 22, 3.15 pm, 2020**

**TBA
TBA
Abstract:** TBA

**Location:**TBA

**Organizer:**TBA

**Wednesday, May 20, 3.15 pm, 2020**

**TBA
TBA**

**Abstract:**TBA

**Location:**TBA

**Organizer:**TBA

**Fall 2019**

**Wednesday, November 6, 3.15 pm, 2019**

**Deep Generative Models and Missing Data
Jes Frellsen,** IT University of Copenhagen

**Abstract:**Deep latent variable models (DLVMs) combine the approximation abilities of deep neural networks and the statistical foundations of generative models. In this talk, we first discuss how these models are estimated: variational methods are commonly used for inference; however, the exact likelihood of these models has been largely overlooked. We show that most unconstrained models used for continuous data have an unbounded likelihood function and discuss how to ensure the existence of maximum likelihood estimates. Then we present a simple variational method, called MIWAE, for training DLVMs, when the training set contains missing-at-random data. Finally, we present Monte Carlo algorithms for missing data imputation using the exact conditional likelihood of DLVMs: a Metropolis-within-Gibbs sampler for DLVMs trained on complete datasets and an importance sampler for DLVMs trained on incomplete data sets. For complete training sets, our algorithm consistently and significantly outperforms the usual imputation scheme used for DLVMs. For incomplete training sets, we show that MIWAE trained models provide accurate single and multiple imputations, and are highly competitive with state-of-the-art methods. This is joint work with Pierre-Alexandre Mattei.

**Location:**Ada Lovelace (Visionen)

**Organizer:**Fredrik Lindsten

**Wednesday, October 16, 3.15 pm, 2019**

**Scaling and Generalizing Approximate Bayesian Inference
David Blei**, Dept. of Computer Science, Columbia University

Abstract: A core problem in statistics and machine learning is to approximate difficult-to-compute probability distributions. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation about a conditional distribution. In this talk I review and discuss innovations in variational inference (VI), a method a that approximates probability distributions through optimization. VI has been used in myriad applications in machine learning and Bayesian statistics. It tends to be faster than more traditional methods, such as Markov chain Monte Carlo sampling. After quickly reviewing the basics, I will discuss our recent research on VI. I first describe stochastic variational inference, an approximate inference algorithm for handling massive data sets, and demonstrate its application to probabilistic topic models of millions of articles. Then I discuss black box variational inference, a generic algorithm for approximating the posterior. Black box inference easily applies to many models but requires minimal mathematical work to implement. I will demonstrate black box inference on deep exponential families---a method for Bayesian deep learning---and describe how it enables powerful tools for probabilistic programming.

**Location:**Ada Lovelace (Visionen)

**Organizer:**Mattias Villani

**Friday, September 27, 3.15 pm, 2019**

**Probabilistic machine learning for volatility
Martin Tegnér**, Information Engineering, Dept of Engineering Science, University of Oxford.

**Abstract:**This work is motivated by recent trends of rough volatility in finance. In place of these parametric models, we suggest to use a non-parametric class based on linear filters on stationary processes where the filter is randomised with a Gaussian process prior. We use variational methods to obtain a probabilistic representation of the filter that can be used for a distribution over the covariance function and its spectral content. We apply the approach to S&P 500 realised volatility data.

**Location:**Alan Turing (E-building)

**Organizer:**Mattias Villani

**Spring 2019**

**Wednesday, May 15, 3.15 pm, 2019**

**Topological and Geometri**c **Methods for Reasoning about Dat**

**Florian T. Pokorny**, Robotics, Perception and Learning Lab, KTH Royal Institute of Technology.

**Abstract:** In this talk, I will discuss our recent work on topological and geometric methods for representing and reasoning about data from a variety of application domains ranging from trajectory clustering to classification of image data. I will focus firstly on our most recent approach to extracting information about high-dimensional Voronoi Cell geometry using Monte Carlo sampling which avoids explicit computation of Voronoi representations. I will discuss how the estimation of weighted integrals over Voronoi boundaries can in particular lead to a simple yet effective geometric classification approach. Secondly, I will also discuss our work towards reasoning about motion and robotic configuration spaces based on simplicial complex representations and persistent homology.

**Location:** Ada Lovelace (Visionen)

**Organizer:** Mattias Villani

**Wednesday, May 8, 3.15 pm, 2019**

**Beyond the mean-field family: Variational inference with implicit distributions**

Francisco Ruiz,Department of Computer Science, Columbia University and Dept of Engineering, University of Cambridge.

Francisco Ruiz,

**Abstract:**Approximating the posterior of a probabilistic model is the central challenge of Bayesian inference. One of the main approximate inference tools is variational inference (VI), which recasts inference as an optimization problem. Classical VI relies on the mean-field approximation, which constrains the variational family to be a fully factorized distribution. While useful, the mean-field assumption may lead to variational families that are not expressive enough to approximate the posterior. In this talk, I present two different ways to expand the expressiveness of the variational family using implicit distributions. First, I describe unbiased implicit VI (UIVI), a method that obtains an implicit variational distribution in a hierarchical manner using simple but flexible reparameterizable distributions. This construction enables unbiased stochastic gradients of the variational objective, making optimization tractable. Second, I describe a method to improve the variational distribution using Markov chain Monte Carlo (MCMC), leveraging the advantages of both inference techniques. To make inference tractable, we introduce the variational contrastive divergence (VCD), a divergence that replaces the standard variational objective based on the Kullback-Leibler divergence. Both UIVI and the VCD are demonstrated empirically through a set of experiments on several probabilistic models.

**Location:**Ada Lovelace (Visionen)

**Organizer:**Fredrik Lindsten

**Wednesday, April 24, 3.15 pm, 2019**

**Evaluating the Performance of Soccer Players
Jesse Davis, **Department of Computer Science, KU Leuven

**Abstract:**Over the last 25 years, there has been tremendous interest in applying computational techniques to analyze sports. This area has exploded in the past decade as modern data collection techniques have enabled collecting large of amounts of data about games and athletes. From a computer science perspective, sports data are very rich and complicated, which poses a number of interesting analysis challenges such as the lack of ground truth labels, the need to construct relevant features, and changing contexts. I will begin the talk by highlighting some of the most important general challenges. Then I will focus on our efforts to assess the performance of soccer players during a match. First, I will describe our approach for assigning values to all on-ball actions during a match. This goes beyond standard approaches such as expected goals and assists that only value on a small subset of actions. Second, I will describe our recent research on trying to understand how mental pressure affects performance. I will explain our mental pressure model, which assigns a pressure level to each minute of match by considering both the match context as well as the current game state. This enables comparing soccer players' performances across different levels of mental pressure. Finally, I will show our approachâ€™s ability to provide actionable insights for soccer clubs in four relevant use cases: player acquisition, training, tactical decisions, and lineups and substitutions.

**Location:**Ada Lovelace (Visionen)

**Organizer**: Patrick Lambrix

**Wednesday, March 27, 3.15 pm, 2019**

**Conformal prediction**

**Henrik Boström, Department of Software and Computer Systems, KTH Royal Institute of Technology**

**Abstract:** Conformal prediction is a framework for quantifying the uncertainty of predictions provided by standard machine learning algorithms. When employing the framework, the probability of making incorrect predictions is bounded by a user-provided confidence threshold. In this talk, we will briefly introduce the framework and illustrate its use in conjunction with both interpretable models, such as decision trees, and highly predictive models, such as random forests.

**Location:** Ada Lovelace (Visionen)

**Organizer:** Oleg Sysoev

**Wednesday, February 27, 3.15 pm, 2019**

**Reliable Semi-Supervised Learning when Labels are Missing at Random**

**Dave Zachariah, Department of Information Technology, Division of Systems and Control, Uppsala University**

**Abstract:** Semi-supervised learning methods are motivated by the availability of large datasets with unlabeled features in addition to labeled data. Unlabeled data is, however, not guaranteed to improve classification performance and has in fact been reported to impair the performance in certain cases. In this talk we discuss some fundamental limitations to semi-supervised learning and restrictive assumptions which result in unreliable classifiers. We also propose a learning approach that relaxes such assumptions and is capable of providing classifiers that reliably quantify the label uncertainty.

**Location:** Ada Lovelace (Visionen)

**Organizer:** Fredrik Lindsten

**Past seminars 2018**

## Wednesday, February 28, 3.15 pm, 2018

**Conditionally Independent Multiresolution Gaussian Processes**

**Jalil Taghia**, Dept. of Information Technology, Uppsala University

*Abstract:* Multiresolution Gaussian processes (GPs) based on hierarchical application of predictive processes assume full independence among GPs across resolutions. The full independence assumption results in models which are inherently susceptible to overfitting, and approximations which are non-smooth at the boundaries. Here, we consider a model variant which assumes conditional independence among GPs across resolutions. We characterize each GP using a particular representation of the Karhunen-LoÃ©ve expansion where each basis vector of the representation consists of an axis and a scale factor, referred to as the basis axis and the basis-axis scale. The basis axes have unique characteristics: They are zero-mean by construction and are on the unit sphere. The axes are modeled using Bingham distributions---a natural choice for modeling axial data. Given the axes, all GPs across resolutions are independent---this is in direct contrast to the common assumption of full independence between GPs. More specifically, all GPs are tied to the same set of axes but the basis-axis scales of each GP are specific to the resolution on which they are defined. Relaxing the full independence assumption helps in reducing overfitting which can be of a problem in an otherwise identical model architecture with full independence assumption. We consider a Bayesian treatment of the model using variational inference.

**Location:** Ada Lovelace (Visionen)

**Organizer:** Mattias Villani

## Wednesday, March 28, 3.15 pm, 2018

**Data-Driven Text Simplification**

Sanja Štajner, Data and Web Science Group, University of Mannheim

**Abstract:** Syntactically and lexically complex texts and sentences pose difficulties both for humans (especially people with various reading or cognitive impairments, or non-native speakers) and for natural language processing systems (e.g. information extraction, machine translation, summarization, semantic role labeling). In the last 30 years, many systems have been proposed that attempt at automatically simplifying vocabulary and sentence structure of complex sentences. This talk will present the existing resources for data-driven text simplification and the latest data-driven approaches to text simplification, based on the use of word embeddings and neural machine translation architectures. The emphasis will be on comparative evaluation of those systems and discussion about possible avenues to improve them.

**Location:** Ada Lovelace (Visionen)

**Organizer:** Arne Jönsson

## Wednesday, April 25, 3.15 pm, 2018

**Invariant Causal Prediction**

Jonas Peters, Dept. of Mathematical Sciences, University of Copenhagen

**Abstract:** Why are we interested in the causal structure of a process? In classical prediction tasks as regression, for example, it seems that no causal knowledge is required. In many situations, however, we want to understand how a system reacts under interventions, e.g., in gene knock-out experiments. Here, causal models become important because they are usually considered invariant under those changes. A causal prediction uses only direct causes of the target variable as predictors; it remains valid even if we intervene on predictor variables or change the whole experimental setting. In this talk, we show how we can exploit this invariance principle to estimate causal structure from data. We apply the methodology to data sets from biology, epidemiology, and finance. The talk does not require any knowledge about causal concepts.

**Location:** Ada Lovelace (Visionen)

**Organizer:** Jose M Pena

## Wednesday, May 23, 3.15 pm, 2018

**Transformation Forests**

Torsten Hothorn, Epidemiology, Biostatistics and Prevention Institute, University of Zurich

**Abstract:** Regression models for supervised learning problems with a continuous response are commonly understood as models for the conditional mean of the response given predictors. This notion is simple and therefore appealing for interpretation and visualisation. Information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference from such models, for example the computation of prediction intervals. Several random forest-type algorithms aim at estimating conditional distributions, most prominently quantile regression forests (Meinshausen, 2006, JMLR). We propose a novel approach based on a parametric family of distributions characterised by their transformation function. A dedicated novel 'transformation tree' algorithm able to detect distributional changes is developed. Based on these transformation trees, we introduce 'transformation forests' as an adaptive local likelihood estimator of conditional distribution functions. The resulting predictive distributions are fully parametric yet very general and allow inference procedures, such as likelihood-based variable importances, to be applied in a straightforward way. The procedure allows general transformation models to be estimated without the necessity of a priori specifying the dependency structure of parameters. Applications include the computation of probabilistic forecasts, modelling differential treatment effects, or the derivation of counterfactural distributions for all types of response variables.

Technical Report available from arXiv

**Location:** Ada Lovelace (Visionen)

**Organizer:** Oleg Sysoev

## Wednesday, November 7, 3.15 pm, 2018

**Conjugate Bayes for Probit Regression via Unified Skew-Normals**

Daniele Durante, Department of Decision Sciences, Bocconi University, Italy

**Abstract:** Regression models for dichotomous data are ubiquitous in statistics. Besides being useful for inference on binary responses, such methods are also fundamental building-blocks in more complex formulations, covering density regression, nonparametric classification, graphical models, and others. Within the Bayesian setting, inference typically proceeds by updating the Gaussian priors for the coefficients with the likelihood induced by probit or logit regressions for the binary responses. In this updating, the apparent absence of a tractable posterior has motivated a variety of computational methods, including Markov Chain Monte Carlo (MCMC) routines and algorithms which approximate the posterior. Despite being routinely implemented, current MCMC methodologies face mixing or time-efficiency issues in large p and small n studies, whereas approximate routines fail to capture the skewness typically observed in the posterior. In this seminar, I will show that the posterior distribution for the probit coefficients has indeed a unified skew-normal kernel, under Gaussian priors. This result allows fast and accurate Bayesian inference for a wide class of applications, especially in large p and small-to-moderate n studies where state-of-the-art computational methods face substantial issues. These notable advances are quantitatively outlined in a genetic study and are further generalized to improve classification via Bayesian Additive Regression Trees (BART).

**Location:** Ada Lovelace (Visionen)

**Organizer:** Hector Rodriguez-Deniz

## Wednesday, December 5, 3.15 pm, 2018

**Accelerating Sequential Monte Carlo and Markov chain Monte Carlo with (deterministic) approximations**

**Jouni Helske**, Division of Media and Information Technology, Linköping University.

**Abstract:** Inference of Bayesian latent variable models can be grouped into deterministic and Monte-Carlo-based methods. The former can often provide accurate and rapid inferences, but are typically associated with biases that are hard to quantify. The latter enjoy asymptotic consistency, but can suffer from high computational costs. In this talk I will show how these approaches can be combined in a way which provides more efficient sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) algorithms. Our proposed SMC strategy uses approximations for "twisted targets" with look-ahead property, allowing the use of less particles than "plain" SMC, while also being less sensitive to the processing order of the variables in probabilistic graphical model (PGM) context. The proposed MCMC approach first uses MCMC targeting an approximate marginal of the target distribution, while the subsequent weighting scheme (based on SMC or importance sampling (IS)) provides consistent weighted estimators. This IS-MCMC approach provides a natural alternative to delayed acceptance (DA) pseudomarginal/particle MCMC, and has many advantages over DA, including a straightforward parallelisation and additional flexibility in MCMC implementation.

**Location:** Ada Lovelace (Visionen)

**Organizer:** Mattias Villani

**Past seminars 2017**

Wednesday, February 1, 3.15 pm, 2017

**On priors and Bayesian predictive methods for covariate selection in large p, small n regression**

Aki Vehtari, Computer Science, Aalto University

**
Abstract: **I first present recent development in hierarchical shrinkage priors for presenting sparsity assumptions in covariate effects. I review an easy and intuitive way of setting up the prior for based on our prior beliefs about the number of effectively nonzero co-efficients in the model. I also discuss the computational issues when using hierarchical shrinkage priors. I emphasise the separation between prior information on sparsity and decision theoretic approach for selecting a smaller set of covariates having good predictive performance. I briefly review comparison of Bayesian predictive methods for model selection and discuss in more detail projection predictive variable selection approach for regression.

**Location:**Ada Lovelace (Visionen)

**Organizer:**Mattias Villani

## Wednesday, March 1, 3.15 pm, 2017

**Visualizing Data using Embeddings**

Laurens van der Maaten, Facebook AI Research

**
Abstract:** Visualization techniques are essential tools for every data scientist. Unfortunately, the majority of visualization techniques can only be used to inspect a limited number of variables of interest simultaneously. As a result, these techniques are not suitable for big data that is very high-dimensional.

An effective way to visualize high-dimensional data is to represent each data object by a two-dimensional point in such a way that similar objects are represented by nearby points, and that dissimilar objects are represented by distant points. The resulting two-dimensional points can be visualized in a scatter plot. This leads to a map of the data that reveals the underlying structure of the objects, such as the presence of clusters.

The talk presents techniques to embed high-dimensional objects in a two-dimensional map. In particular it focuses on a technique called t-Distributed Stochastic Neighbor Embedding (t-SNE) that produces substantially better results than alternative techniques. We demonstrate the value of t-SNE in domains such as computer vision and bioinformatics. In addition, we show how to scale up t-SNE to sets with millions of objects, and we present variants of the technique that can visualize objects of which the similarities cannot appropriately be modeled in a single map (such as semantic similarities between words) and that can visualize data based on partial similarity rankings of the form "A is more similar to B than to C".

The work presented in this talk was done jointly with Geoffrey Hinton and Kilian Weinberger.

**Location:**Ada Lovelace (Visionen)

**Organizer:**Leif Jonsson

## Wednesday, March 29, 3.15 pm, 2017

**Machine Learning in Production: Challenges and Choices**

Theodoros Vasiloudis, SICS Swedish ICT

**
Abstract:** As machine learning (ML) finds its way into more and more areas in our life, software developers from all fields are asked to navigate an increasingly complex maze of tools and algorithms to extract value out of massive datasets. Despite the importance that machine learning programs have in production systems, the specific challenges they pose have not been studied extensively.

In this talk we will present an overview of the literature on machine learning in production and discuss the challenges of a complete deployment pipeline: design, implementation, testing, deployment, and monitoring. The talk will include considerations for issues like data readiness, algorithm and software selection, and we'll try to point out some common mistakes and misconceptions in the development and deployment of machine learning systems.

**Location:**Ada Lovelace (Visionen)

**Organizer:**Oleg Sysoev

## Wednesday, April 26, 3.15 pm, 2017

**Deep Learning with Uncertainty**

Andrew Gordon Wilson, Cornell University

**
Abstract:** In this talk, we approach model construction from a probabilistic perspective. First, we introduce a scalable Gaussian process framework capable of learning expressive kernel functions on large datasets. We then develop this framework into an approach for deep kernel learning, with non-parametric capacity, inductive biases given by deep architectures, full predictive distributions, and automatic complexity calibration. We will consider applications in image inpainting, crime prediction, epidemiology, counterfactuals, autonomous vehicles, astronomy, and human learning, including very recent state of the art results.

**Location:**Ada Lovelace (Visionen)

**Organizer:**Per Sidén

## Wednesday, May 24 , 3.15 pm, 2017

**Corpus Curation, Latent Semantics, and the Theory of Topic Modeling**

David Mimno, Cornell University

**
Abstract:** Topic models have been in widespread use for more than a decade. But we are only now starting to recognize what these models are really doing and how and why people actually use them. In this talk I'll cover recent theoretical work that places topic models in a larger context that includes LSA and word embeddings. I'll also cover practical work that recognizes the choices made in using topic models to study documents, from epistemology to stemming and stopword removal. These results have specific implications for how we use statistical document models. But the effect of those choices also inform us about what these models are really doing.

**Location:**Ada Lovelace (Visionen)

**Organizer:**Måns Magnusson

## Wednesday, September 13, 3.15 pm, 2017

**Open Research Problems in Computer Graphics and Games
J.P. Lewis, **SEED Research Lab, Electronic Arts

**The computer games and movie visual effects industries are increasingly tracking and adopting academic research in machine learning and computer vision. This talk will survey some of these applications. The talk will then mention some open research problems motivated by industry. Lastly, we will also identify assumptions in academic research that occasionally prevent promising results from being easily adopted.**

Abstract:

Abstract:

**Ada Lovelace (Visionen)**

Location:

Location:

**Mattias Villani**

Organizer:

Organizer:

## Friday, October 13, 3.15 pm, 2017 (Note the day!)

**Multi-target prediction: a unifying view on problems and methods**

Willem Waegeman, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Belgium

**
Abstract:** Traditional methods in machine learning and statistics provide data-driven models for predicting one-dimensional targets, such as binary outputs in classification and real-valued outputs in regression. In recent years, novel application domains have triggered fundamental research on more complicated problems where multi-target predictions are required. Such problems arise in diverse application domains, such as document categorization, tag recommendation of images, videos and music, information retrieval, medical decision making, drug discovery, marketing, biology, geographical information systems, etc. In this talk I will present a unifying view on multi-target prediction (MTP) in two directions. In the first part I will establish connections among different MTP problems settings, by formalizing several subfields of machine learning, such as multi-label classification, multivariate regression, multi-task learning, etc. In the second part I will describe general principles that lead to performance improvements in several types of MTP problems.

**Location:**Ada Lovelace (Visionen)

**Organizer:**Jose M. Peña

## Wednesday, November 8, 3.15 pm, 2017

**Approximate Bayesian inference: Variational and Monte Carlo methods**

**Christian A. Naesseth, **Division of Automatic Control, Department of Electrical Engineering, Linköping University

**Abstract:** Many recent advances in large scale probabilistic inference rely on the combination of variational and Monte Carlo (MC) methods. The success of these approaches depends on (i) formulating a flexible parametric family of distributions, and (ii) optimizing the parameters to find the member of this family that most closely approximates the exact posterior. My aim is to show how MC methods can be used not only for stochastic optimization of the variational parameters, but also for defining a more flexible parametric approximation in the first place. First, I will review variational inference (VI). Second, I describe some of the pivotal tools for VI, based on MC methods and stochastic optimization, that have been developed in the last few years. Finally, I will show how we can synthesize sequential Monte Carlo methods and VI to learn more accurate posterior approximations with theoretical guarantees.

**Location:** Ada Lovelace (Visionen)

**Organizer:** Mattias Villani

## Wednesday, December 6, 3.15 pm, 2017

**Lexicon-Supervised Word Sense Embedding Methods**

**Richard Johansson**, Department of Computer Science and Engineering, Chalmers University and University of Gothenburg

**
Abstract:** In the last few years, word embeddings have become part of the standard toolbox in machine learning for language processing applications. However, the most widely used word embedding methods associate each word type with a single vectorial representation, without taking into account that many words can have several distinct meanings. In this talk, we present a number of different approaches to building multi-sense embeddings, where each word type can be associated with more than one representation, and we show how the sense embeddings can be improved by guiding the learning process using a graph-structured lexicon. We apply the sense embeddings in applications such as word sense disambiguation and semantic lexicon expansion.

**Location:**Ada Lovelace (Visionen)

**Organizer:**Marco Kuhlmann