Foto: Karl Öfverström

**Seminars Spring 2020**

**The LiU Seminar Series in Statistics and Mathematical Statistics**

**Tuesday, February 4, 3.15 pm, 2020. Seminar in Mathematical Statistics.**

Minimum distance histograms with universal performance guarantees

**Raazesh Sainudiin,** Department of Mathematics, Uppsala University

**Abstract:**We present a data-adaptive multivariate histogram estimator of an unknown density f based on n independent samples from it. Such histograms are based on binary trees called regular pavings (RPs). RPs represent a computationally convenient class of simple functions that remain closed under addition and scalar multiplication. Unlike other density estimation methods, including various regularization and Bayesian methods based on the likelihood, the minimum distance estimate (MDE) is guaranteed to be within an L1distance bound from f for a given n, no matter what the underlying f happens to be, and is thus said to have universal performance guarantees (Devroye and Lugosi, Combinatorial methods in density estimation. Springer, New York, 2001). Using a form of tree matrix arithmetic with RPs, we obtain the first generic constructions of an MDE, prove that it has universal performance guarantees and demonstrate its performance with simulated and real-world data. Our main contribution is a constructive implementation of an MDE histogram that can handle large multivariate data bursts using a tree-based partition that is computationally conducive to subsequent statistical operations.

**Location**: Hopningspunkten.

**Tuesday, March 3, 3.15 pm, 2020. Seminar in Statistic**s.

**TBA**

**Maria Knorps**, IF Research Polska

**Abstract:** TBA

**Location:** Alan Turing.

**Tuesday, March 17, 3.15 pm, 2020. Seminar in Statistics.**

**TBA**

**Moritz Schauer,** Department of Mathematical Sciences, Chalmers University of Technology

**Abstract: **TBA

**Location: **Alan Turing.

**Tuesday, March 31, 3.15 pm, 2020. Seminar in Mathematical Statistics.**

**TBA**

**Annika Lang,** Department of Mathematical Sciences, Chalmers University of Technology

**Abstract:**TBA

Location: Hopningspunkten.

**Tuesday, May 5, 3.15 pm, 2020. Seminar in Statistics.**

**TBA
Magdalena Bogdańska**, Comarch SA

**Abstract:**TBA

**Location:**Alan Turing.

**Tuesday, May 12, 3.15 pm, 2020. Seminar in Statistics.**

**TBA**

**Simone Blomberg**, School of Biological Sciences, The University of Queensland

**Abstract:** TBA

**Location:** Alan Turing.

**Tuesday, May 26, 3.15 pm, 2020. Seminar in Mathematical Statistics.**

**TBA**

**Anja Janssen**, Department of Mathematics, KTH

**Abstract:**TBA

**Location: **Hopningspunkten.

**Seminars Fall 2019**

**Tuesday, November 26, 3.15 pm, 2019. Seminar in Statistics.**

**Adaptive Bayesian SLOPE—High-dimensional Model Selection with Missing Values
Małgorzata Bogdan,** Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology

**Abstract:**We consider the problem of variable selection in high-dimensional settings with missing observations among the covariates. To address this relatively understudied problem, we propose a new synergistic procedure—adaptive Bayesian SLOPE—which effectively combines the SLOPE method (sorted L1 regularization) together with the Spike-and-Slab LASSO method. We position our approach within a Bayesian framework which allows for simultaneous variable selection and parameter estimation, despite the missing values. As with the Spike-and-Slab LASSO, the coefficients are regarded as arising from a hierarchical model consisting of two groups: (1) the spike for the inactive and (2) the slab for the active. However, instead of assigning independent spike priors for each covariate, here we deploy a joint "SLOPE'' spike prior which takes into account the ordering of coefficient magnitudes in order to control for false discoveries. Through extensive simulations, we demonstrate satisfactory performance in terms of power, FDR and estimation bias under a wide range of scenarios. Finally, we analyze a real dataset consisting of patients from Paris hospitals who underwent severe trauma, where we show excellent performance in predicting platelet levels. Our methodology has been implemented in C++ and wrapped into an R package ABSLOPE for public use.This is a join work with Wei Jiang and Julie Josse from Ecole Polytechnique, Błażej Miasojedow from University of Warsaw, Veronika Rockova from the Chicago Booth School of Business and TraumaBase group from the Hospital Beaujon, APHP, France.

**Location:**Alan Turing.

**Tuesday, October 29, 3.15 pm, 2019. Seminar in Statistics.**

**Central limit theorems for functionals of large sample covariance matrix and mean vector in matrix-variate location mixture of normal distributions
Stepan Mazur,** Örebro University School of Business (jointly with Taras Bodnar and Nestor Parolya)

**Abstract:**In this talk, we consider the asymptotic distributions of functionals of the sample covariance matrix and the sample mean vector obtained under the assumption that the matrix of observations has a matrix-variate location mixture of normal distributions. The central limit theorem is derived for the product of the sample covariance matrix and the sample mean vector. Moreover, we consider the product of the inverse sample covariance matrix and the mean vector for which the central limit theorem is established as well. All results are obtained under the large-dimensional asymptotic regime, where the dimension p and the sample size n approach infinity such that p/n -> c ∈ [0, +∞) when the sample covariance matrix does not need to be invertible and p/n -> c ∈ [0, 1) otherwise.

**Location:**Alan Turing

**Tuesday, October 15, 3.15 pm, 2019. Seminar in Mathematical Statistics.**

**Moment constrained optimal dividends: precommitment & consistent planning
Kristoffer Lindensjö,** Department of Mathematics, Stockholm University

**Abstract:**A moment constraint that limits the number of dividends in the optimal dividend problem is suggested. This leads to a new type of time-inconsistent stochastic impulse control problem. First, the optimal solution in the precommitment sense is derived. Second, the problem is formulated as an intrapersonal sequential dynamic game in line with Strotz' consistent planning. In particular, the notions of pure dividend strategies and a (strong) subgame perfect Nash equilibrium are adapted. An equilibrium is derived using a smooth fit condition. The equilibrium is shown to be strong. The uncontrolled state process is a fairly general diffusion.

**Location:**Hopningspunkten.

**Tuesday, October 1, 3.15 pm, 2019. Seminar in Statistics.**

**Hierarchical Bayesian mixture modeling of resting-state functional brain connectivity. Cross-sectional and longitudinal approaches
Tetiana Gorbach, **Department of Integrative Medical Biology and Umeå School of Business, Economics and Statistics, Umeå University

**During the seminar, two studies of functional brain connectivity will be presented. In the first cross-sectional study, we propose a Bayesian hierarchical mixture model to analyze functional brain connectivity where mixture components represent "connected" and "non-connected" brain regions. Mixture modelling provides a data-informed separation of reliable connections from noise in contrast to arbitrary thresholding of a connectivity matrix. The hierarchical structure of the model allows simultaneous inferences for the entire population and each subject separately. We show that the posterior probability of a given pair of brain regions to be connected given the observed correlation of regions' activity might be superior to correlation measure of connectivity. The applicability of the introduced method is exemplified by a study of functional resting-state brain connectivity in older adults based on the data from the Betula project.**

Abstract:

Abstract:

In the second study, we extend cross-sectional modeling to a longitudinal data with two scheduled measurements and dropout at the second measurement occasion. We develop a model that provides valid inferences when dropout from the study is not at random. The analysis of longitudinal data demonstrates that longitudinal estimates of connectivity changes may deviate from cross-sectional estimates. The simulation study shows that ignoring dropout mechanism may yield erroneous conclusions regarding connectivity changes.

**Alan Turing.**

Location:

Location:

**Tuesday, September 17, 3.15 pm, 2019. Seminar in Mathematical Statistics.**

**Bayesian learning of weakly structural Markov graph laws using sequential Monte Carlo methods
Jimmy Olsson,** Department of Mathematics, KTH

**Abstract:**We shall discuss a sequential Monte Carlo-based approach to approximation of weakly structural Markov graph laws on spaces of decomposable graphs, or, more generally, spaces of junction (clique) trees associated with such graphs. In particular, we apply a particle Gibbs version of the algorithm to Bayesian structure learning in decomposable graphical models, where the target distribution is a junction tree posterior distribution. Moreover, we use the proposed algorithm for exploring certain fundamental combinatorial properties of decomposable graphs, e.g. clique size distributions. Our approach requires the design of a family of proposal kernels, so-called junction tree expanders, which expand junction trees by connecting randomly new nodes to the underlying graphs. The performance of the estimators is illustrated through a collection of numerical examples demonstrating the feasibility of the suggested approach in high-dimensional domains.

**Location:**Hopningspunkten.

**Tuesday, September 3, 3.15 pm, 2019. Seminar in Statistics.**

**Rao score test for BCS structured covariance matrix under high-dimensional regime
Jolanta Pielaszkiewicz,** Division of Statistics and Machine Learning, Department of Computer and Information Science, Linköping University

**Abstract:**The Rao score test for hypothesis about the Block Compound Symmetry structure of a covariance matrix will be presented and then modified to version that is appropriate for analysis of high-dimensional data, i.e a data matrix with increasing (at asymptotically constant ratio) numbers of columns and rows will be considered. The asymptotic distribution of modified test statistics will be developed using tools of random matrix theory and will be an extension of earlier results regarding independence and sphericity testing.

**Location:**Alan Turing.

**Seminars Spring 2019**

**Tuesday, May 14, 3.15 pm, 2019. Seminar in Statistics.**

**Usage of gaps between observations**

**Magnus Ekström, **Statistics, Umeå School of Business, Economics and Statistics, Umeå University

**Abstract: **In this talk, I will provide a brief overview of some basic ideas in the theory of gaps between successive observations (a.k.a. spacings or sample spacings). After reviewing some basic properties, I will discuss the use of spacings in estimating parameters and in testing statistical hypotheses. It will be argued that such methods of statistical inference have asymptotic properties that closely parallel those of likelihood-based methods in regular parametric models. Moreover, they can be shown to work also in unbounded likelihood problems, where both the maximum likelihood method and the generalized likelihood ratio test may break down. Unlike the maximum likelihood estimators, some variants of the estimators based on spacings are quite robust under heavy contamination.

**Location: **Alan Turing.

**Tuesday, May 7, 3.15 pm, 2019. Seminar in Mathematical Statistics.**

**Statistical Learning as a Compression Problem from the Information Theory Perspective**

Chun-Biu Li,Mathematical Statistics, Department of Mathematics, Stockholm University

Chun-Biu Li,

**Abstract:**Although it was introduced in the context of communication theory, modern information theory provides us with a nonparametric probabilistic framework for statistical learning free from a priori assumption on the underlying statistical model. In this talk, I will discuss some of the information theory based methods for unsupervised and supervised learning. In particular, the soft (fuzzy) clustering problem in unsupervised learning can be viewed as a tradeoff between data compression and minimizing the distortion of the data. Similarly, modeling in supervised learning can be treated as a tradeoff between compression of the predictor variables and retaining the relevant information about the response variable. To illustrate the usage of these methods, some applications in biophysical problems and time series analysis will be addressed in the talk.

**Location:**Hopningspunkten.

**Tuesday, April 16, 3.15 pm, 2019. Seminar in Statistics.**

**Single cell analysis and cell type identification in medical research
Sandra Lilja, **Division of Children's and Women's Health, Department of Clinical and Experimental Medicine, Linköping University

**Abstract:**Many patients today do not respond to treatment and an important reason for this may be the involvement of thousands of genes in multiple cell types. Single cell analysis can thus help to gain a systems level understanding of diseases, in order to find efficient diagnostic markers and treatments, as it allows for analysis of the expression of all genes in thousands of individual cells, one by one.

A bottleneck during single cell analysis is the computational identification of the different cell types in the tissue. During the laboratory procedure, the transcriptome of the different cells is separated and marked before sequencing. The transcript count for each gene in each cell can thus be identified. Using the transcriptomic profile of all the different cells in the samples, they can then be clustered into groups, and the corresponding cell type for each cluster identified. There are many different methods and statistical packages available for clustering and cell type identification, using some different approaches. These work well in different situations, though choosing the best approach is not always easy. During the seminar I will present the most commonly used methods today, how, and when they are used. I will also go through potential drawbacks, and when cell typing using these techniques may be problematic or fail.

**Location:**Alan Turing.

**Tuesday, March 5, 3.15 pm, 2019. Seminar in Mathematical Statistics.**

**Muller's Ratchet in Populations Doomed to Extinction**

**Peter Olofsson, **Department of Mathematics, Physics and Chemical Engineering, Jönköping University

**Abstract:** Muller's ratchet is the process by which asexual populations accumulate deleterious mutations in an irreversible manner. Most mathematical models have been of the Wright-Fisher type with fixed population size and relative fitness. In contrast, we use a branching process model with absolute fitness, leading to unavoidable extinction. Individuals are divided into classes depending on how many mutations they have accumulated, and we give results for the rate of the ratchet and the size of the fittest class.

**Location: **Hopningspunkten.

**
Past Seminars Fall 2018**

**Tuesday, August 28, 3.15 pm, 2018. Seminar in Mathematical Statistics.**

**Estimation of Kronecker structured covariance based on modified Cholesky decomposition
**

**Chencheng Hao**, School of Statistics and Information, Shanghai University of International Business and Economics

**Abstract:**This paper is to study covariance estimation problems for high dimensional matrix-valued data. We propose a covariance estimator for the matrix-valued data from penalized matrix normal likelihood. Modified Cholesky decomposition of covariance matrix is utilized to construct positive definite estimators. The method is applied for identify parsimony and for producing a statistically efficient estimator of a large covariance matrix of matrix-valued data. Simulation results are illustrated.

**Location:**Hopningspunkten.

**Tuesday, September 25, 3.15 pm, 2018. Seminar in Statistics.**

**Interval estimation for a binomial proportion**

**Per Gösta Andersson**, Department of Statistics, Stockholm University

**Abstract:** The construction of a confidence interval for the binomial parameter p is an elementary, yet not trivial problem. Many procedures, using e.g. normal approximation, have been suggested over the years, some of which will be presented and commented upon. We will start by looking at the standard Wald interval and highlight its erratic and generally bad behaviour, before moving on to intervals with substantially improved properties. Priority will be given to accurate coverage probability, although interval length is important to take into consideration.

**Location:** Alan Turing.

**Tuesday, October 9, 3.15 pm, 2018. Seminar in Mathematical Statistics.**

**Title:** Generalized Divide and Color models

**Johan Tykesson**, Mathematical Sciences, Chalmers University of Technology and University of Gothenburg

**Abstract:** In this talk, we consider the following model: one starts with a finite or countable set V, a random partition of V and a parameter p in [0,1]. The corresponding Generalized Divide and Color Model is the {0,1}-valued process indexed by V obtained by independently, for each partition element in the random partition chosen, with probability p, assigning all the elements of the partition element the value 1, and with probability 1-p, assigning all the elements of the partition element the value 0. A very special interesting case of this is the "Divide and Color Model" (which motivates the name we use) introduced and studied by Olle Häggström. A number of quite varied well-studied processes actually fit into this context such as the Ising model, the stationary distributions for the Voter Model and random walk in random scenery. Some of the questions which we study here are the following. Under what situations can different random partitions give rise to the same color process? What can one say concerning exchangeable random partitions? What is the set of product measures that a color process stochastically dominates? For random partitions which are translation invariant, what ergodic properties do the resulting color processes have? In the talk, we will focus most attention to the case when V is a finite set.

The talk is based on joint work with Jeff Steif.

**Location:** Hopningspunkten.

**Tuesday, October 23, 3.15 pm, 2018. Seminar in Statistics.**

**Title:** A comprehensive approach for predicting pathogenicity in bacteria

**Sebastian Sakowski**, Faculty of Mathematics and Computer Science, University of Łódź

**Abstract:** In this presentation, I will discuss a general approach, based on the Binary State Speciation and Extinction (BiSSE) model, for predicting pathogenicity in bacterial populations from microsatellites profiling data. I will in particular discuss an example of using the BiSSE model to estimate parameters from genetic data, exactly from a real dataset of 251 Escherichia coli strains. Additionally, I will briefly review results of a research in the field of DNA computing and molecular programming.

**Location:** Alan Turing.

** **

**Tuesday, November 6, 3.15 pm, 2018. Seminar in Mathematical Statistics.**

**Title:** Fractional limit processes in shot noise models

**Ingemar Kaj**, Department of Mathematics, Uppsala University

**Abstract:** A wide variety of random processes and spatial random fields arise naturally as Poisson shot noise models, with shots of random location and size. Such models with power-law size intensities, display a range of limit processes under aggregation and suitable scaling of parameters. We discuss the various scaling regimes and their limits, which include fractional Brownian motion, fractional Poisson type motions, and stable processes, Allowing for a type of dependence between shots, yet another hybrid Gaussian-Poisson model appears in the limit.

**Location:** Hopningspunkten.

** **

**Tuesday, December 18, 3.15 pm, 2018. Seminar in Statistics.**

**Title:** Complier average causal effect analysis using growth mixture modeling

**Speaker:** **Hugo Hesser**, Department of Behavioural Sciences and Learning, Linköping University

**Abstract:** Randomized controlled trials (RCTs) offer a unique opportunity to test causal effects. However, treatment noncompliance is a common problem in RCTs and is a major threat to causal inferences. Intention-to-treat analysis (ITT), which is widely used to estimate treatment effects in RCTs, does not provide an estimate of the effect of treatment per se in the presence of noncompliance, and ad hoc methods for dealing with noncompliance in RCTs (e.g. as-treated or per-protocol analysis) do not provide an unbiased estimate of the average causal effect (ACE). The current presentation will focus on how an unbiased local estimate of (L)ACE can be obtained for the subgroup of compliers in RCTs using complier average causal effects (CACE) analysis. The talk will focus on model identification, specifications and assumptions for obtaining maximum likelihood estimates in CACE models using growth mixture modeling in a structural equation modeling framework. Priority will be given to CACE modeling in practice for evaluating non-pharmacological treatments, and CACE analysis will be illustrated with a randomized controlled add-on component trial of an internet-delivered psychological treatment for irritable bowel syndrome (Hesser et al., 2017, Psychological Medicine).

**Location:** Alan Turing.

**Past Seminars Spring 2018**

** **

**
Tuesday, January 30, 3.15 pm, 2018. Seminar in Statistics.**

**Level set Cox processes
**

**Anders Hildeman**, Mathematical Sciences, Chalmers University of Technology and University of Gothenburg

**Abstract:**Our work focuses on modelling point process data that is observed on a spatial domain consisting of several spatial regions with fundamentally different behaviour, and where the classification of the spatial domain in to these regions is unknown. The aim of the analyst might be either to classify the regions, perform Kriging predictions or derive some field parameter properties from one or several of the point pattern classes.

To handle data of this type, we propose an extension to the popular log-Gaussian Cox process (LGCP) model. The LGCP model uses a latent Gaussian random field (GRF) to, a priori, characterize the Poisson intensity. Our extension is based on replacing the latent GRF by a latent spatial mixture model of GRFs. The mixture model is specified using a, categorically valued, random field which represent the classification of the spatial domain. This allows for parametrizing the model through stationary covariance functions and mean value functions specified using covariates. A MCMC method based on the preconditioned Crank-Nicholson MALA algorithm is proposed for Bayesian inference.

Finally, the model is demonstrated on data from the popular Barro Colorado rain forest data set. It is shown that the proposed model is able to capture behavior for which inference based on the LGCP is biased.

**Location:** Alan Turing.

** **

**Tuesday, February 13, 3.15 pm, 2018. Seminar in Mathematical Statistics.**

**Local law of addition of random matrices on optimal scale
**

**Kevin Schnelli**, KTH

**Abstract:**Describing the eigenvalue distribution of the sum of two general Hermitian matrices is basic question going back to Weyl. If the matrices have high dimensionality and are in general position in the sense that one of them is conjugated by a random Haar unitary matrix, the eigenvalue distribution of the sum is given by the free additive convolution of the respective spectral distributions. This result was obtained by Voiculescu on the macroscopic scale. In this talk, I show that it holds on the microscopic scale all the way down to the eigenvalue spacing. This shows a remarkable rigidity phenomenon for the eigenvalues.

**Location:**Hopningspunkten.

**Tuesday, February 28 30, 3.15 pm, 2018. Seminar in Statistics.**

**Conditionally Independent Multiresolution Gaussian Processes**

**Abstract:** Multiresolution Gaussian processes (GPs) based on hierarchical application of predictive processes assume full independence among GPs across resolutions. The full independence assumption results in models which are inherently susceptible to overfitting, and approximations which are non-smooth at the boundaries. Here, we consider a model variant which assumes conditional independence among GPs across resolutions. We characterize each GP using a particular representation of the Karhunen-LoÃ©ve expansion where each basis vector of the representation consists of an axis and a scale factor, referred to as the basis axis and the basis-axis scale. The basis axes have unique characteristics: They are zero-mean by construction and are on the unit sphere. The axes are modeled using Bingham distributions---a natural choice for modeling axial data. Given the axes, all GPs across resolutions are independent---this is in direct contrast to the common assumption of full independence between GPs. More specifically, all GPs are tied to the same set of axes but the basis-axis scales of each GP are specific to the resolution on which they are defined. Relaxing the full independence assumption helps in reducing overfitting which can be of a problem in an otherwise identical model architecture with full independence assumption. We consider a Bayesian treatment of the model using variational inference.

**Location:** Alan Turing.

**Organizer:** Mattias Villani

** **

**Tuesday, March 13, 3.15 pm, 2018. Seminar in Mathematical Statistics.**

**On high dimensional data analysis under errors in variables**

**Silvelyn Zwanzig**, Department of Mathematics, Uppsala University

**Abstract:** Errors in variables induce additional complications already in models with p< n. The total least squares estimator is theoretically the best estimator in this case. In literature methods are presented for high dimensional sparse models with errors in variables. In the talk I will study the behavior of total least squares estimator for non sparse models with n << p and propose a generalized version of it.

**Location:** Hopningspunkten.

** **

**Tuesday, March 20, 3.15 pm, 2018. (Extra) Seminar in Mathematical Statistics.**

**Estimation and residual analysis in the GMANOVA-MANOVA model**

**Béatrice Byukusenge**, Department of Mathematics, Linköping University

**Abstract:** In this talk we will consider the GMANOVA-MANOVA model, which is a special case of an extended growth curve model, with no assumption of the nested subspace condition. We derive two residuals, establish their properties and give interpretation. Finally, a numerical example on a data set from a study that was conducted to investigate two treatments for patients suffering from multiple sclerosis is performed to validate the theoretical results.

**Location:** Kompakta rummet.

**Tuesday, April 24, 3.15 pm, 2018. Seminar in Statistics.**

**Data-driven confounder selection for estimating average causal effects
Jenny Häggström, **Statistics

**,**Umeå School of Business, Economics and Statistics, Umeå Universitet

**Abstract:**To unbiasedly estimate a causal effect, on an outcome of interest, unconfoundedness is often assumed. If there is sufficient knowledge on the underlying causal structure then existing confounder selection criteria can be used to select subsets of the observed pretreatment covariates sufficient for unconfoundedness, if such subsets exist. de Luna, Waernbaum and Richardson (Biometrika, 2011), embracing the Neyman-Rubin model, characterized subsets from the original reservoir of covariates that are minimal in the sense that the treatment ceases to be unconfounded given any proper subset of these minimal sets and proposed data-driven algorithms for the selection of minimal sets of covariates. Here, the selection of such target subsets is considered when the underlying causal structure is unknown. Persson, Häggström, Waernbaum and de Luna (CSDA, 2017) implemented the above algorithms using two model free dimension reduction techniques: marginal co-ordinate hypothesis tests and kernel-based smoothing. Häggström (Biometrics, 2017) proposed to model the unknown causal structure by a probabilistic graphical model, e.g. a Bayesian network, estimate this graph from observed data and select the target subsets given the estimated graph. The approaches were evaluated by simulation.

**Location:**Alan Turing.

**Tuesday, May 2, 3.15 pm, 2018. Seminar in Mathematical Statistics.**

**Testing hypotheses about covariance structures under multi-level multivariate models using Rao score**

**Katarzyna Filipiak**, Poznań University of Technology **
Abstract:** Modern experimental techniques allow to collect and store multi- level multivariate data in almost all fields such as agriculture, biology, biomedical, medical, environmental and engineering areas, where the observations are collected on more than one response variable at different locations, repeatedly over time, and at different "depths", etc. Before any statistical analysis it is vital to test the appropriate mean and variance-covariance structures on the multi-level multivariate observations.

In this talk the Rao's score test (RST) statistic for testing the hypotheses about variance-covariance structures, such as e.g. separable structures with one component structured or exchangeable structures, is presented. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data. Monte Carlo simulations are then used to study the behavior of the empirical type I error as well as the empirical null distribution of the RST statistic with respect to the sample size. It is shown that RST outperforms the commonly used likelihood ratio test in all considered areas.

References:

[1] Roy, A., K. Filipiak, and D. Klein (2018). Testing a block exchangeable covariance matrix. Statistics 52(2), 393-408.

[2] Filipiak, K., D. Klein, and A. Roy (2017). A comparison of likelihood ratio tests and Rao's score test for three separable covariance matrix structures. Biometrical Journal 59, 192-215.

[3] Filipiak, K., D. Klein, and A. Roy (2016). Score test for a separable covariance structure with the first component as compound symmetric correlation matrix. Journal of Multivariate Analysis 150, 105-124.

**Location:**Kompakta rummet.

**Tuesday, May 22, 3.15 pm, 2018. Seminar in Statistics.**

**Large-scale MCMC samplers for probabilistic topic models**

Måns Magnusson, Department of Computer and Information Science, Linköping University**
Abstract:** Probabilistic topic models have proven to be an extremely versatile class of mixed-membership models for discovering the thematic structure of text collections. There are many possible applications, covering a broad range of areas of study: technology, natural science, social science and the humanities. New efficient parallel Markov Chain Monte Carlo inference algorithms is proposed for Bayesian inference in large topic models. The proposed methods scale well with the corpus size and can be used for other probabilistic topic models and other natural language processing applications. The proposed methods are fast, efficient, scalable, and will converge to the true posterior distribution.

**Location:**Alan Turing.

** **

## Tuesday, June 5, 3.15 pm, 2018. Seminar in Mathematical Statistics.

Asymptotic integration by parts formula and regularity of probability lawsVlad Bally, Université Paris-Est Marne-la-Vallée, France

**Abstract:**Download

**Location:**Hopningspunkten.

** **

**Thursday, June 7, 1.15 pm, 2018. Seminar in Statistics.**

**Strategies for Distributed Bayesian Computation**

Alexander Terenin, Imperial College London

**Abstract:** In this talk, I will discuss two popular approaches for running Markov Chain Monte Carlo methods on parallel and distributed systems: methods based on exchangeability, and asynchronous methods. I will discuss the precise ways in which these are able to take advantage of parallelism, and how that interacts with the system's architecture from a performance and efficiency perspective. I will then discuss how these approaches affect convergence, showcasing recent theoretical analysis of asynchronous methods, and conclude with a discussion on how reliability of Monte Carlo output and performance considerations should be considered when selecting what type of method to deploy in a given setting.

**Location:** John von Neumann.

** **

**Tuesday, June 12, 3.15 pm, 2018. Seminar in Mathematical Statistics.**

**Stochastic Deformation of Classical Integrability**

Jean-Claude Zambrini, Department of Mathematics, University of Lisbon, Portugal

**Abstract:** Is it possible to deform, along quantum-like trajectories, one of the deepest notions of ODE's theory, the one of integrable systems? We shall start from a classical example, then summarize the method of Stochastic Deformation. It will provide a way to deform Jacobi's strategy to reach this goal in the classical, deterministic, case. This talk is founded on a joint work with C. Léonard (Paris-Ouest Nanterre).

**Location:** Hopningspunkten.

**Past seminars fall 2017**

## Tuesday, September 5, 3.15 pm, 2017. Seminar in Statistics.

**Gibbs sampling for Latent Dirichlet Allocation**

**Johan Jonasson**, Department of Mathematical Sciences, Chalmers University of Technology

MCMC and in particular Gibbs sampling is ubiquitous in Bayesian machine learning models. In this talk I will shortly review the Latent Dirichlet Allocation model for text classification and a hidden Markov model thereof. The task is to infer topics from the text in an unsupervised way and a common way is to use collapsed Gibbs sampling (i.e. integrating out the unknown random parameters). It would be desirable to have these to converge fast, and we show that in a very simple special case, mixing time is polynomial in the number of tokens.

**Location:** Alan Turing.

## Wednesday, September 20, 3.15 pm, 2017. Seminar in Mathematical Statistics.

**Covering a subset of R^d by Poissonian random sets**

**Erik Broman**, Department of Mathematical Sciences, Chalmers University of Technology

Abstract: The problem of covering a set A by a collection of random sets dates back to Dvoretzky in 1954. Since then, a host of papers have been written on the subject. In this talk we shall review some of this history and discuss two directions in which progress have recently been made.

In the first case we consider a statistically scale invariant collection of subsets of R^d, which are chosen at random according to a Poisson process of intensity lambda. The complement of the union of these sets is then a random fractal that we denote by C. Such random fractals have been studied in many contexts, but here we are interested in the critical value of lambda for which the set C is almost surely empty (so that R^d is completely covered). Such problems were earlier studied and solved in one dimension, while here we shall present recent progress which solves it in all dimensions. This part is based on joint work with J. Jonasson and J. Tykesson.

In the second direction we consider a dynamic version of coverings. For instance, the set A could be a box of side lengths n, and then balls are raining from the sky at unit rate. One then asks for the time at which A is covered. Together with F. Mussini I have recently studied a variant in which the balls are replaced by bi-infinite cylinders. This makes the problem fundamentally different as one no longer have independence between well separated regions. Thus, new methods and techniques must be used. Our main result is that we find the correct asymptotics for the cover time as the set A grows.

**Location:** Hopningspunkten.

## Tuesday, October 3, 3.15 pm, 2017. Seminar in Statistics.

**On Probabilistic Independence Models and Graphs**

**Kayvan Sadeghi**, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge

Abstract: The main purpose of this talk is to explore the relationship between the set of conditional independence statements induced by a probability distribution and the set of separations induced by graphs as studied in graphical models. I define one general type of graph and one separation criterion, and show that almost all known types of graphs and separation criteria are a special case of these. I introduce the concepts of Markov property and faithfulness, and provide conditions under which a given probability distribution is Markov or faithful to a graph. I discuss the implications of these conditions in statistics, probability theory, and machine learning.

**Location:** Alan Turing.

## Tuesday, October 17, 3.15 pm, 2017. Seminar in Mathematical Statistics.

**Stochastic Gene Switches**

**Joanna Tyrcha**, Department of Mathematics, Stockholm University

Abstract: The timing of key events in the eukaryotic cell cycle is remarkably stochastic. Special attention had been paid to the START transition, when the cell starts to synthesize DNA. Experiments have shown that START in budding yeast proceeds in two distinct steps, both of which are stochastic. We look at the cellular reactions responsible for the stochasticity in these and similar transitions. Their dynamics can be described by stochastic differential equations, allowing us to write a path-integral representation for the transition rate. We study also the bursting limit in which we can eliminate the mRNA of our model if we study an appropriate time scale.

**Location:** Hopningspunkten.

## Tuesday, October 31, 3.15 pm, 2017. Seminar in Statistics.

**Frequentist model averaging in structural equation modelling**

**Shaobo Jin**, Department of Statistics, Uppsala University

Abstract: Model selection from a set of candidate models plays an important role in many structural equation modelling applications. However, traditional model selection methods introduce extra randomness that is not accounted for by post-model selection inference. In the current study, we propose a model averaging technique within the frequentist statistical framework. Instead of selecting an optimal model, the contribution of all candidate models are acknowledged. Valid confidence intervals and a chi-square test statistic are proposed. A simulation study shows that the proposed method is able to produce a robust mean squared error, a better coverage probability, and a better goodness-of-fit test compared to model selection. It is an interesting compromise between model selection and the full model.

**Location:** Alan Turing.

## Tuesday, November 14, 3.15 pm, 2017. Seminar in Mathematical Statistics.

**Random coalescing geodesics in first-passage percolation**

**Daniel Ahlberg**, Department of Mathematics, Stockholm University

Abstract: Since the work of Kardar-Parisi-Zhang in the mid 1980s, it has been widely believed that a large class of two-dimensional growth models should obey the same asymptotic behaviour. This behaviour stands in contrast to one-dimensional behaviour where fluctuations are dictated by the central limit theorem. To rigorously understand the predictions of KPZ-theory has become one of the most central themes in mathematical physics. One prominent model believed to belong to this class is known as first-passage percolation. It can be interpreted as the random metric on Z2 obtained by assigning non-negative i.i.d. weights to the edges of the nearest neighbour lattice. We shall discuss properties of geodesics in this metric and their connection to KPZ-theory. As a first step in this direction, we answer a question posed by Benjamini, Kalai and Schramm in 2003, that has come to be known as the 'midpoint problem'. This is joint work with Chris Hoffman.

**Location:** Hopningspunkten.

## Tuesday, November 28, 3.15 pm, 2017. Seminar in Statistics.

**Probabilistic programming for statistical phylogenetics**

**Fredrik Ronquist**, Department of Bioinformatics and Genetics, Swedish Museum of Natural History Abstract: Statistical inference based on phylogenetic models - models built around evolutionary trees - is widely used throughout the life sciences today. The field is completely dominated by Bayesian MCMC methods, which were introduced about 20 years ago. The flexibility and computational efficiency of this approach have resulted in explosive development of phylogenetic models. It has been quite challenging for computational biologists to keep up with the rapidly expanding model space, and the field is dominated today by a plethora of software packages, each dealing with a specific subset of models. There is a clear need for more generic approaches to model construction and inference. We have tried to address these challenges by developing Rev, a probabilistic programming language for statistical phylogenetics based on probabilistic graphical models. Unlike most other such languages, Rev is designed for use in an interactive computing environment, allowing users to build phylogenetic models step by step, and examine the model components as they go. I describe some of the challenges involved in developing an interactive probabilistic programming language and some of the potential and limitations of probabilistic graphical models in phylogenetics.

**Location:** Alan Turing.

## Tuesday, December 12, 3.15 pm, 2017. Seminar in Mathematical Statistics.

**Asymptotic Behaviour in Time for a Singular Stochastic Newton Equation**

**Astrid Hilbert**, Department of Mathematics, Linnaeus University

Abstract: TBA

**Location:** Hopningspunkten.

Past seminars spring 2017

Past seminars spring 2017

## Tuesday, February 7, 3.15 pm, 2017. Seminar in Statistics

**Correlated Variables in High Dimensional Linear Regression Models: Clustering and Combination of Penalties based Methods**

**Niharika Gauraha**, Indian Statistical Institute, Bangalore and KTH Variable selection in high dimensional regression problems is challenging specially in presence of highly correlated predictors. The Least Absolute Shrinkage and Selection Operator (Lasso) is a widely used regularized regression method for variable selection in high dimensional problems, but it tends to select a single predictor from a group of highly correlated predictors even if many or all of these predictors are relevant. We discuss the following approaches for correlated variable selection: 1. The concept of clustering or grouping correlated predictors and then pursuing group- wise model fitting. For example, cluster Lasso Methods, and Stability Feature Selection using Cluster Representative Lasso etc. 2. Simultaneous clustering and model fitting that involves combination of two different penalties. For example, Elastic Net is a combination of the Lasso penalty (L1) and the Ridge (L2) penalty. Location: Alan Turing.

## Tuesday, February 21, 3.15 pm, 2017. Seminar in Mathematical Statistics

**Non-life (re)insurance pricing: an introduction**

**Alex Teterukovsky**, IF Skadeförsäkring Insurance is about transferring risks between parties. A party who assumes a risk or a portfolio of risks is normally compensated by the other so-called ceding party. We will look into principles for how this compensation is calculated from both sides. The seminar will cover risk-based pricing of single risks, as well as portfolios of risks, both from the perspective of the insured and from that of the insurer. Special attention will be given to reinsurance pricing, i.e. the transfer of risks from the insurer to other insurers. The seminar will focus on practical issues insurance companies face in their daily work. Take your pens with you, as we'll try to have a hands-on pricing exercise, if time permits. Location: Hopningspunkten.

## Tuesday, March 7, 3.15 pm, 2017. Seminar in Statistics

**A Punctuated Stochastic Model of Adaptation**

**Krzysztof Bartoszek**, Uppsala University and Linköping University Contemporary stochastic evolution models commonly assume gradual change for a phenotype. However the fossil record and biological theory suggests that development is rather undergoing punctuated change. In this setup one assumes that there are very short time intervals during which dramatic change occurs and between these the species are at stasis. Motivated by this I will present a branching Ornstein-Uhlenbeck model with jumps at speciation points. I will in particular discuss a very recent result concerning weak convergence: for a classical Central Limit Theorem to hold dramatic change has to be a rare event. Location: Alan Turing.

## Monday, March 20, 3.15 pm, 2017. Seminar in Mathematical Statistics

**Galton-Watson processes with the expectation kernel having an atom**

**Serik Sagitov**, Mathematical Statistics, Chalmers University

Branching processes with infinitely many types are usually studied under the assumptions of the Perron-Frobenius theorem for the expectation kernels. The Perron-Frobenius eigenvalue then gives the growth rate of the process allowing to distinguish between subcritical, critical, and supercritical reproduction regimes. We consider Galton-Watson processes with a general type space whose expectation kernels have a particular structure, ensuring the existence of an embedded single-type branching process. The mean offspring number of the embedded process can be used as a criticality gauge for the multi-type Galton-Watson process even outside the Perron-Frobenius zone. The talk will be given on a level suitable for Ph.D. students.

Location: Hopningspunkten.

## Tuesday, April 4, 3.15 pm, 2017. Seminar in Statistics

**Optimal design for dose-finding in clinical trials**

**Frank Miller**, Dept. of Statistics, Stockholm University

When new drugs are developed, an important step is to determine the "right" dose to be recommended for patients. If a too low dose would be recommended, the drug would not achieve sufficient effect. If a too high dose would be chosen, patients have increased risk for adverse events. Usually large clinical trials are conducted to determine the right dose. It is important that the design of these so-called dose-finding trials is good to ensure that valuable information is obtained about the relationship between dose and effect of the drug. We apply optimal design theory to determine good designs for dose-finding studies. Since the assumed statistical model for the relationship between dose and effect is a non-linear regression model, optimal designs depend on unknown parameters. We present several methods to deal with this difficulty and illustrate them in a case study.

Location: Alan Turing.

## Tuesday, April 18, 3.15 pm, 2017. Seminar in Mathematical Statistics

**On connections between some classical mortality laws and proportional frailty**

**Mathias Lindholm,** Mathematical Statistics, Stockholm University

*Abstract:* We provide a simple frailty argument that produces the Gompertz-Makeham mortality law as the population hazard rate under the assumption of proportional frailty given a common exponential hazard rate. Further, based on a slight generalisation of the result for the Gompertz-Makeham law the connection to Perks and Beard's mortality laws are discussed. Moreover, we give conditions for which functional forms of the baseline hazard that will yield proper frailty distributions given that we want to retrieve a certain overall population hazard within the proportional frailty framework.

Location: Hopningspunkten.

## Tuesday, May 2, 3.15 pm, 2017. Seminar in Statistics

**An overview of measures for population-based cancer survival**

**Therese Andersson**, Dept. of Medical Epidemiology and Biostatistics, Karolinska Institutet.

I will introduce the field of population-based cancer survival analysis and its role in cancer control. I will especially cover the concept of relative survival and why it is often preferred over cause-specific survival for the study of cancer patient survival using data collected by population-based cancer registers. I will also present different measures of cancer-patient survival such as, the proportion cured, the loss in life expectancy due to cancer and crude vs net probabilities of death. Each of these measures show different aspects of cancer patient survival, and examples from published population-based studies will be presented and discussed.

**Location:**Alan Turing.

## Tuesday, May 16, 3.15 pm, 2017. Seminar in Mathematical Statistics

**Weak convergence of individual Mahalanobis distances**

**Thomas Holgersson**, Dept. of Economics and Statistics, Linnæus University

Mahalanobis distance (MD) is used in a wide range of applications, such as graphical analysis, outlier detection, discriminant analysis, multivariate calibration, non-normality testing, construction of process control charts and many others. Although the distributional properties of sample MD's are well developed in the case of finite dimension, little is known about the behavior in high-dimensional settings. We present some types of weak convergence of sample Mahalanobis distances, along with some other limiting properties.

**Location:**Hopningspunkten.