What is Mathematical Methods for Arbitrary Data Sources?

The lecture series will collect talks on mathematical disciplines related to all kind of data, ranging from statistics and machine learning to model-based approaches and inverse problems. Each pair of talks will address a specific direction, e.g., a NoMADS session related to nonlocal approaches or a DeepMADS session related to deep learning.

The series is created in the spirit of the One World Series pioneered by the seminars in probability and PDE.

Using Zoom

For this online seminar we will use zoom as video service. Approximately 15 minutes prior to the beginning of the lecture, a zoom link will be provided on this website and via mailing list.

Mailing list

Please subscribe to our mailing list by filling this form.

Materials and Video Recordings

Additional material, slides and video recordings are attached to the respective session in our session archive. Additionally please visit our YouTube channel for video recordings of past sessions.

Past Sessions

  • Apr 20, 2020: Session I

    Speaker Title tap/hover for abstract Materials
    Gabriel PeyréCNRS, FR
    Ecole Normale Supérieure, FR
    Scaling Optimal Transport for High dimensional Learning
    Scaling Optimal Transport for High dimensional Learning

    Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will review entropic regularization methods which define geometric loss functions approximating OT with a better sample complexity. More information and references can be found on the website of our book Computational Optimal Transport.
    video
    slides
    Marie-Therese WolframWarwick University, UK
    joint work with:Andrew StuartCaltech, US
    Inverse Optimal Transport
    Inverse Optimal Transport

    Discrete optimal transportation problems arise in various contexts in engineering, the sciences and the social sciences. Examples include the marriage market in economics or international migration flows in demographics. Often the underlying cost criterion is unknown, or only partly known, and the observed optimal solutions are corrupted by noise. In this talk we discuss a systematic approach to infer unknown costs from noisy observations of optimal transportation plans. The proposed methodologies are developed within the Bayesian framework for inverse problems and require only the ability to solve the forward optimal transport problem, which is a linear program, and to generate random numbers. We illustrate our approach using the example of international migration flows. Here reported migration flow data captures (noisily) the number of individuals moving from one country to another in a given period of time. It can be interpreted as a noisy observation of an optimal transportation map, with costs related to the geographical position of countries. We use a graph-based formulation of the problem, with countries at the nodes of graphs and non-zero weighted adjacencies only on edges between countries which share a border. We use the proposed algorithm to estimate the weights, which represent cost of transition, and to quantify uncertainty in these weights.
    video
    slides
  • May 4, 2020: Session II

    Speaker Title tap/hover for abstract Materials
    Lorenzo RosascoUniversitá di Genova, IT
    MIT, US
    Efficient learning with random projections
    Efficient learning with random projections

    Despite stunning performances, state of the art machine learning approaches are often computational intensive and efficiency remains a challenge. Dimensionality reduction, if performed efficiently, provides a way to reduce the computational requirements of downstream tasks, but possibly at the expanses of the obtained accuracy. In this talk, we discuss the interplay between accuracy and efficiency when dimensionality reduction is performed by means of, possibly data dependent, random projections. The latter are related to discretization methods for integral operators, to sampling methods in randomized numerical linear algebra and to sketching methods. Our results show that there are number of different tasks and regimes where, using random projections and regularization, efficiency can be improved with no costs in accuracy. Theoretical results are used to derive scalable and fast kernel methods for datasets with millions of points.
    video
    slides
    Michaël FanuelKU Leuven, BE
    joint work with:Joachim Schreurs, Johan Suykens
    Diversity sampling in kernel method
    Diversity sampling in kernel method

    A well-known technique for large scale kernel methods is the Nyström approximation. Based on a subset of landmarks, it gives a low rank approximation of the kernel matrix, and is known to provide a form of implicit regularization. We will discuss the impact of sampling diverse landmarks for constructing the Nyström approximation in supervised and unsupervised problems. In particular, three methods will be considered: uniform sampling, leverage score sampling and Determinantal Point Processes (DPP). The implicit regularization due the diversity of the landmarks will be made explicit by numerical simulations and analysed further in the case of DPP sampling by some theoretical results.
    video
    slides
  • May 18, 2020: Session III

    Please note: The second talk by Fracis Bach was jointly organized with the One World Optimization Seminar.

    Speaker Title tap/hover for abstract Materials
    Lars RuthottoEmory University, US Machine Learning meets Optimal Transport: Old solutions for new problems and vice versa
    Machine Learning meets Optimal Transport: Old solutions for new problems and vice versa

    This talk presents new connections between optimal transport (OT), which has been a critical problem in applied mathematics for centuries, and machine learning (ML), which has been receiving enormous attention in the past decades. In recent years, OT and ML have become increasingly intertwined. This talk contributes to this booming intersection by providing efficient and scalable computational methods for OT and ML.
    The first part of the talk shows how neural networks can be used to efficiently approximate the optimal transport map between two densities in high dimensions. To avoid the curse-of-dimensionality, we combine Lagrangian and Eulerian viewpoints and employ neural networks to solve the underlying Hamilton-Jacobi-Bellman equation. Our approach avoids any space discretization and can be implemented in existing machine learning frameworks. We present numerical results for OT in up to 100 dimensions and validate our solver in a two-dimensional setting.
    The second part of the talk shows how optimal transport theory can improve the efficiency of training generative models and density estimators, which are critical in machine learning. We consider continuous normalizing flows (CNF) that have emerged as one of the most promising approaches for variational inference in the ML community. Our numerical implementation is a discretize-optimize method whose forward problem relies on manually derived gradients and Laplacian of the neural network and uses automatic differentiation in the optimization. In common benchmark challenges, our method outperforms state-of-the-art CNF approaches by reducing the network size by 8x, accelerate the training by 10x- 40x and allow 30x-50x faster inference.
    video
    slides
    Francis BachUniversité PSL, FR
    joint work with:Lénaïc Chizat
    On the convergence of gradient descent for wide two-layer neural networks
    On the convergence of gradient descent for wide two-layer neural networks

    Many supervised learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this talk, I will consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived. I will also highlight open problems related to the quantitative behavior of gradient descent for such models.
    video
    slides
  • Jun 8, 2020: Session IV

    Speaker Title tap/hover for abstract Materials
    Michael UnserBiomedical Imaging Group EPFL, CH Representer theorems for machine learning and inverse problems
    Representer theorems for machine learning and inverse problems

    Regularization addresses the ill-posedness of the training problem in machine learning or the reconstruction of a signal from a limited number of measurements. The standard strategy consists in augmenting the original cost functional by an energy that penalizes solutions with undesirable behaviour. In this presentation, I will present a general representer theorem that characterizes the solutions of a remarkably broad class of optimization problems in Banach spaces and helps us understand the effect of regularization. I will then use the theorem to retrieve some classical characterizations such as the celebrated representer theorem of machine leaning for RKHS, Tikhonov regularization, representer theorems for sparsity promoting functionals, as well as a few new ones, including a result for deep neural networks.
    video
    slides
    Vincent DuvalInria, FR Representing the solutions of total variation regularized problems
    Representing the solutions of total variation regularized problems

    The total (gradient) variation is a regularizer which has been widely used in inverse problems arising in image processing, following the pioneering work of Rudin, Osher and Fatemi. In this talk, I will describe the structure the solutions to the total variation regularized variational problems when one has a finite number of measurements.
    First, I will present a general representation principle for the solutions of convex problems, then I will apply it to the total variation by describing the faces of its unit ball.

    It is a joint work with Claire Boyer, Antonin Chambolle, Yohann De Castro, Frédéric de Gournay and Pierre Weiss.
    video
    slides
  • Jun 15, 2020: Session V

    Speaker Title tap/hover for abstract Materials
    Andrea BraidesUniversità Roma Tor Vergata, IT Continuum limits of interfacial energies on (sparse and) dense graphs
    Continuum limits of interfacial energies on (sparse and) dense graphs

    I review some results on the convergence of energies defined on graphs. My interest in such energies comes from models in Solid Mechanics (where the bonds in the graph represent the relevant atomistic interactions) or Statistical Physics (Ising systems), but the nodes of the graph can also be thought as a collection of data on which the bonds describe some relation between the data.
    The typical objective is an approximate (simplified) continuum description of problems of minimal cut as the number N of the nodes of the graphs diverges.
    If the graphs are sparse (i.e. the number of bonds is much less than the total number of pairs of nodes as N goes to infinity), often (more precisely when we have some control on the range or on the decay of the interactions) such minimal-cut problems translate into minimal-perimeter problems for sets or partitions on the continuum. This description is easily understood for periodic lattice systems, but carries on also for random distributions of nodes. In the case of a (locally) uniform Poisson distribution, actually the limit minimal-cut problems are described by more regular energies than in the periodic-lattice case.
    When we relax the hypothesis on the range of interactions, the description of the limit of sparse graphs becomes more complex, as it depends subtly on geometric characteristics of the graph, and is partially understood. Some easy examples show that, even though for the continuum limit we still remain in a similar analytical environment, the description as (sharp) interfacial energies can be lost in this case, and more “diffuse” interfaces must be taken into account.
    If instead we consider dense sequences of graphs (i.e., the number of bonds is of the same order as the total number of pairs as N goes to infinity) then a completely different limit environment must be used, that of graphons (which are abstract limits of graphs), for which sophisticated combinatoric results can be used. We can re-read the existing notion of convergence of graphs to graphons as a convergence of the related cut functionals to non-local energies on a simple reference parameter set. This convergence provides an approximate description of the corresponding minimal-cup problems.
    Works in collaboration with Alicandro, Cicalese, Piatnitski and Solci (sparse graphs) and Cermelli and Dovetta (dense graphs).
    video
    slides
    Nicolás García TrillosUniversity of Wisconsin-Madison, US Regularity theory and uniform convergence in the large data limit of graph Laplacian eigenvectors on random data clouds
    Regularity theory and uniform convergence in the large data limit of graph Laplacian eigenvectors on random data clouds

    Graph Laplacians are omnipresent objects in machine learning that have been used in supervised, unsupervised and semi supervised settings due to their versatility in extracting local and global geometric information from data clouds. In this talk I will present an overview of how the mathematical theory built around them has gotten deeper and deeper, layer by layer, since the appearance of the first results on pointwise consistency in the 2000’s, until the most recent developments; this line of research has found strong connections between PDEs built on proximity graphs on data clouds and PDEs on manifolds, and has given a more precise mathematical meaning to the task of “manifold learning”. In the first part of the talk I will highlight how ideas from optimal transport made some of the initial steps, which provided L2 type error estimates between the spectra of graph Laplacians and Laplace-Beltrami operators, possible. In the second part of the talk, which is based on recent work with Jeff Calder and Marta Lewicka, I will present a newly developed regularity theory for graph Laplacians which among other things allow us to bootstrap the L2 error estimates developed through optimal transport and upgrade them to uniform convergence and almost C^{0,1} convergence rates. The talk can be seen as a tale of how a flow of ideas from optimal transport, PDEs, and in general, analysis, has made possible a finer understanding of concrete objects popular in data analysis and machine learning.
    video
    slides
  • Jun 29, 2020: Session VI

    Speaker Title tap/hover for abstract Materials
    Jana de WiljesUniversität Potsdam, DE Sequential learning for decision support under uncertainty
    Sequential learning for decision support under uncertainty

    In many applicational areas there is a need to determine a control variable that optimizes a pre-specified objective. This problem is particularly challenging when knowledge on the underlying dynamics is subject to various sources of uncertainty. A scenario such as that arises for instance in the context of therapy individualization to improve the efficacy and safety of medical treatment. Mathematical models describing the pharmacokinetics and pharmacodynamics of a drug together with data on associated biomarkers can be leveraged to support decision-making by predicting therapy outcomes. We present a continuous learning strategy which follows a novel sequential Monte Carlo tree search approach and explore how the underlying uncertainties reflect in the approximated control variable.
    video
    slides
    Björn SprungkTU Freiburg, DE Noise-level robust Monte Carlo methods for Bayesian inference with infomative data
    Noise-level robust Monte Carlo methods for Bayesian inference with infomative data

    The Bayesian approach to inverse problems provides a rigorous framework for the incor-poration and quantification of uncertainties in measurements, parameters and models. However, sampling from or integrating w.r.t. the resultung posterior measure can become computationally challenging. In recent years, a lot of effort has been spent on deriving dimension-independent methods and to combine efficient sampling strategies with multilevel or surrogate methods in order to reduce the computational burden of Bayesian inverse problems.
    In this talk, we are interested in designing numerical methods which are robust w.r.t. the size of the observational noise, i.e., methods which behave well in case of concentrated posterior measures. The concentration of the posterior is a highly desirable situation in practice, since it relates to informative or large data. However, it can pose as well a significant computational challenge for numerical methods based on the prior or reference measure. We propose to employ the Laplace approximation of the posterior as the base measure for numerical integration in this context. The Laplace approximation is a Gaussian measure centered at the maximum a-posteriori estimate (MAPE) and with covariance matrix depending on the Hessian of the log posterior density at the MAPE. We discuss convergence results of the Laplace approximation in terms of the Hellinger distance and analyze the efficiency of Monte Carlo methods based on it. In particular, we show that Laplace-based importance sampling and quasi-Monte-Carlo as well as Laplace-based Metropolis-Hastings algorithms are robust w.r.t. the concentration of the posterior for large classes of posterior distributions and integrands whereas prior-based Monte Carlo sampling methods are not.
    video
    slides

Organizers

Leon Bungert
Martin Burger
Antonio Esposito
Janic Föcke
Daniel Tenbrinck
Philipp Wacker

Other One World Seminars