Wednesday Seminars | Department of Computer Science | University of Cambridge

Geometric Gaussian Processes

Viacheslav (Slava) Borovitskiy

https://vab.im

Outline

Background: 

Probabilistic models

  • Motivation: Model-based decision-making
  • Basic types of probabilistic models
  • Probabilistic models on geometric domains
  • Motivating geometric Gaussian processes: A benchmark

Geometric Gaussian processes

  • Theory
  • Various geometric settings
  • Implementation

Summary

Probabilistic models

Motivation:

Model-based decision-making

Model-based decision-making

Probabilistic models: input distribution (prediction + uncertainty)

Allows computing queries like probability of improvement: $\!\alpha(x) \!=\! \mathbb{P} \! \del{f(x) \!>\!\! f^*}$

They can drive selection of the next point to evaluate [Bayesian optimization]

Basic types of probabilistic models

Basic types of probabilistic models

Bayesian neural networks

Deep ensembles

$\underbrace{\hphantom{\text{Bayesian Neural Networks}\qquad{Deep ensembles}}}_{\text{defined by architectures}}$

Gaussian processes

$\underbrace{\hphantom{\text{Gaussian Processes}}}_{\text{defined by kernels}}$

Basic types of probabilistic models: Discussion

Gaussian processes (GPs) — gold standard.
In practice: General-purpose Gaussian processes (RBF and Matérn kernels).

  • Inductive bias:
    • Stationarity: training on a shifted data produces a shifted model.
    • Smoothness: continuity and differentiability.

  • Scope:
    • Work very well in low dimensions (e.g. tabular data).
    • Struggle in high dimensions (e.g. image / text data).
      Though deep ensembles and Bayesian neural networks struggle too!

Geometric GPs: extend scope to geometric settings.

Probabilistic models

on 

Geometric domains

Geometric domains

Manifolds
e.g. for robotics
(analytic or meshes)

Graphs
e.g. for road networks
(graphs: transductive)

Spaces of graphs
e.g. for drug design
(graphs: inductive)

Example graph learning problem

Example graph learning problem

: Node regression

$$ f : V \to \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/g1.svg}}\Big) = 2 $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/g2.svg}}\Big) = 3 $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/g3.svg}}\Big) = 5 $$

Example graph learning problem: Probabilistic node regression

$$ f : V \to \text{distributions over } \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/g1.svg}}\Big) = \f{N}(2, 0.5^2) $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/g2.svg}}\Big) = \f{N}(3, 0.2^2) $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/g3.svg}}\Big) = \f{N}(5, 0.7^2) $$

Benchmark

Reminder: Basic types of probabilistic models

Bayesian neural networks

Deep ensembles

$\underbrace{\hphantom{\text{Bayesian Neural Networks}\qquad{Deep ensembles}}}_{\text{defined by architectures}}$

Gaussian processes

$\underbrace{\hphantom{\text{Gaussian Processes}}}_{\text{defined by kernels}}$

Benchmark: Data

Benchmark: Caltrans Performance Measurement System (PeMS)

San Jose highway network: graph with 1016 nodes

325 labeled nodes with known traffic speed in miles per hour

Use 250 labeled nodes for training data and 75 for test data

Dataset details: Borovitskiy et al. (AISTATS 2021)

Benchmark: Results

Trivial baseline

Trivial baseline

: prediction

: uncertainty

Results

Results

Figure: Uncertainty estimates of a geometric Gaussian process.     

This benchmark: geometric GP is best, GNN ensemble fails.

Geometric Gaussian processes

Background: Gaussian processes (GPs)

Gaussian processes — random functions with jointly Gaussian marginals.

Bayesian learning: prior GP + data = posterior GP

Prior GPs are determined by kernels (covariance functions)
— have to be positive semi-definite (PSD),
— not all kernels define "good" GPs.

Background: General-purpose GPs in $\R^d$

Matérn kernels:

$$ \htmlData{class=fragment fade-out,fragment-index=6}{ \footnotesize \mathclap{ k_{\nu, \kappa, \sigma^2}(x,x') = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \del{\sqrt{2\nu} \frac{\abs{x-x'}}{\kappa}}^\nu K_\nu \del{\sqrt{2\nu} \frac{\abs{x-x'}}{\kappa}} } } \htmlData{class=fragment d-print-none,fragment-index=6}{ \footnotesize \mathclap{ k_{\infty, \kappa, \sigma^2}(x,x') = \sigma^2 \exp\del{-\frac{\abs{x-x'}^2}{2\kappa^2}} } } $$

$\sigma^2$: variance

$\kappa$: length scale

$\nu$: smoothness

$\nu\to\infty$: RBF kernel (Gaussian, heat, diffusion)

$\nu = 1/2$

$\nu = 3/2$

$\nu = 5/2$

$\nu = \infty$

Can we generalize these to geometric domains?

Distance-based approach

$$ k_{\infty, \kappa, \sigma^2}(x,x') = \sigma^2\exp\del{-\frac{|x - x'|^2}{2\kappa^2}} $$

$$ k_{\infty, \kappa, \sigma^2}^{(d)}(x,x') = \sigma^2\exp\del{-\frac{d(x,x')^2}{2\kappa^2}} $$

Graph nodes: not PSD for some $\kappa$ unless can be isometrically embedded into a Hilbert space.

Schoenberg (Trans. Am. Math. Soc. 1938).

Manifolds: not PSD for some $\kappa$ (in some cases—for all) unless the manifold is isometric to $\mathbb{R}^d$.

Feragen et al. (CVPR 2015) and Da Costa et al. (SIMODS 2025).

Not PSD on the square graph.

Not PSD on the circle manifold.

Another approach: stochastic partial differential equations

$$ \htmlData{class=fragment,fragment-index=0}{ \ubr{\del{\frac{2\nu}{\kappa^2} - \Delta}^{\frac{\nu}{2}+\frac{d}{4}} f = \c{W}}{\t{Matérn}} } $$

$\Delta$: Laplacian

$\c{W}$: Gaussian white noise

$d$: dimension

Generalizes well

Same inductive biases: stationary and smooth

Implicit: no formula for the kernel

Whittle (ISI 1963)

Lindgren et al. (JRSSB 2011)

Making it explicit:
node sets of graphs

Matérn kernels on weighted undirected graphs

SPDE turns into a stochastic linear system: solution has kernel $$ \htmlData{fragment-index=2,class=fragment}{ k_{\nu, \kappa, \sigma^2}(i, j) = \frac{\sigma^2}{C_{\nu, \kappa}} \sum_{n=0}^{\abs{V}-1} \Phi_{\nu, \kappa}(\lambda_n) \v{f_n}(i)\v{f_n}(j) } $$

$\lambda_n, \v{f_n}$: eigenvalues and eigenvectors of the Laplacian matrix

$$ \htmlData{fragment-index=4,class=fragment}{ \Phi_{\nu, \kappa}(\lambda) = \begin{cases} \htmlData{fragment-index=5,class=fragment}{ \del{\frac{2\nu}{\kappa^2} - \lambda}^{-\nu} } & \htmlData{fragment-index=5,class=fragment}{ \nu\lt\infty : \t{Matérn} } \\ \htmlData{fragment-index=6,class=fragment}{ e^{-\frac{\kappa^2}{2} \lambda} } & \htmlData{fragment-index=6,class=fragment}{ \nu = \infty : \t{heat (sq. exp., RBF)} } \end{cases} } $$

Examples: $k_{5/2}(\htmlStyle{color:rgb(242, 249, 88)!important;}{\bullet},\.)$ on node sets of different graphs

Making it explicit:
Other settings

Making it explicit: Other settings

Compact manifolds: $$ \htmlData{class=fragment}{ \begin{aligned} k_{\nu, \kappa, \sigma^2}(x,x') &= \frac{\sigma^2}{C_{\nu, \kappa}} \sum_{n=0}^\infty \Phi_{\nu, \kappa}(\lambda_n) f_n(x) f_n(x') \\ &= \frac{\sigma^2}{C_{\nu, \kappa}} \sum_{l=0}^\infty \Phi_{\nu, \kappa}(\lambda_l) \underbrace{\sum_{s=1}^{d_s} f_{l, s}(x) f_{l, s}(x')}_{G_l(x, x')} \end{aligned} } $$

Infinite summation where $\lambda_n, f_n$ are Laplace–Beltrami eigenpairs

Addition theorems: more efficient $G_l(x, x')$-based expressions

Making it explicit: Other settings

Non-compact manifolds:

  • No summation: instead, integral approximated via Monte Carlo

Spaces of graphs:

  • Geometry: make a set of graphs to be a node set of a larger graph
  • Then, heat and Matérn kernels are automatically defined
  • Larger graph is huge: need clever computational techniques

Gaussian processes:

Various geometric settings

Gaussian processes:

 Various geometric settings

Simple manifolds:
meshes, spheres, tori

Borovitskiy et al. (AISTATS 2020)

Compact Lie groups and
homogeneous spaces:
$\small\f{SO}(n)$, resp. $\small\f{Gr}(n, r)$

Azangulov et al. (JMLR 2024)

Non-compact symmetric
spaces: $\small\bb{H}_n$ and $\small\f{SPD}(n)$

Azangulov et al. (JMLR 2024)

Gaussian processes: Various geometric settings

Vector fields on simple manifolds

Robert-Nicoud et al. (AISTATS 2024)

Gaussian processes: Various geometric settings

Edges of graphs

Yang et al. (AISTATS 2024)

Metric graphs

Bolin et al. (Bernoulli 2024)

Cellular Complexes

Alain et al. (ICML 2024)

Spaces of Graphs

Borovitskiy et al. (AISTATS 2023)

Gaussian processes: Various geometric settings

Semi-supervised learning:
scalar functions

Fichera et al. (NeurIPS 2023)

Semi-supervised learning:
vector fields

Peach et al. (ICLR 2024)

Implementation

Implementation

Implementation: Details

Supported spaces:

Supported backends: PyTorch, JAX, TensorFlow, NumPy.

Summary

Summary

Geometric machine learning: increasingly popular

  • Waves: kernels on structured data, geometric deep learning

Geometric probabilistic models: emerging research area

  • Geometric probabilistic > geometric + probabilistic
  • Geometric Gaussian processes: strong off-the-shelf baselines