UAI 2024 Tutorial

Geometric Probabilistic Models

Viacheslav (Slava) Borovitskiy

Alexander Terenin

UAI 2024 Tutorial

Geometric Probabilistic Models

Part I

Alexander Terenin

Part I: background, motivation, technical fundamentals

Model-based decision-making
Probabilistic modeling and uncertainty
Geometric learning: working with structured data

Part II: geometric probabilistic models

Three types of probabilistic models: an example problem
Deep dive: geometric Gaussian processes
Emerging applications
Software
Summary

Model-based decision-making

: Bayesian optimization

Automatic explore-exploit tradeoff

Principle of Separation:

prediction and decision-making

Probabilistic models

Predictions and uncertainty

Extrapolation and uncertainty

Neural network baseline

Extrapolation and uncertainty

Gaussian process: squared exponential kernel

Extrapolation and uncertainty

Gaussian process: polynomial kernel

Extrapolation and uncertainty

Neural network ensemble

Extrapolation and uncertainty

Bayesian neural network using MCMC

Extrapolation and uncertainty

Gaussian process: stationary kernel

Gaussian process: polynomial kernel

Ensemble

Bayesian neural network

Different models: different behavior

Goal: reasonable control of model's uncertainty behavior

Vague idea: regulate uncertainty using symmetries

Geometric Learning

What are we trying to do?

Non-linear regression and classification: $$ \htmlClass{fragment}{ \min_{f\in\c{F}} \sum_{i=1}^N L(y_i, f(x_i)) } $$ where

$(x_i,y_i)$: training data
$L$: loss function, such as square loss $\norm{y_i - f(x_i)}^2$
$\c{F}$: some space of functions $f : X \to Y$

Spaces $X$ and/or $Y$ carry geometric structure

What are we trying to do?

Probabilistic non-linear regression and classification: $$ \htmlClass{fragment}{ y_i \~ p_{y\given f}(\.\given f(x_i)) } $$ where

$(x_i,y_i)$: training data
$p_{y\given f}$: likelihood, such as Gaussian $y_i \~\f{N}(f(x_i), \sigma^2)$
Bayesian approach: place a prior $p_f$ over functions $f : X \to Y$

Most models can be formulated in both ways

Spaces $X$ and $Y$ carry geometric structure

What are we trying to do? An example: graphs

$$ f : G \to \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/part_1/g1.svg}}\Big) \to \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/part_1/g2.svg}}\Big) \to \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/part_1/g3.svg}}\Big) \to \R $$

Geometric learning: settings

Meshes

Graphs

Manifolds

Vector Fields

And many more!

Challenge: how do we represent data?

Graph: $G = (V,E)$ where

$V$: set of vertices
$E \subseteq V \x V$: edges

Numerical representation:

$\v{V}$: integers
$\m{E}$: binary matrix

Different orderings represent same graph $\leadsto$ permutation invariance

$\v{V} \leftrightsquigarrow \m{P}\v{V}$

$\m{E} \leftrightsquigarrow \m{P}\m{E}\m{P}^T$

$\m{P}$: permutation matrix

Challenge: how do we represent data?

Two ways of representing points on a sphere:

Points in $\R^3$: $(x,y,z) \in \R^3$ with $x^2 + y^2 + z^2 = 1$
Angles: $(\theta,\phi)$ representing angle relative to a chosen point

Different numbers: same point on sphere

Predictions should depend on our data, not on how we store it numerically

Challenge: how do we represent data?

$ \large\x\, $

$ \large= $

Same vector field: represented differently in different frames

Predictions should depend on our data, not on the choice of frame

Symmetries: predictions that depend only on data

Need models that respect representation's symmetries

Invariance

Equivariance

$ \x\, $

$ = $

Transformations of representation don't affect output

Transformations correctly carry over from input to output

Challenge: model design

Symmetries: extrapolation and uncertainty

Design principle: models which respect the space's symmetries

Symmetry-based extrapolation
Symmetry-based uncertainty

..unless data indicates otherwise?

Just as relevant in classical Euclidean setting

Geometric learning: right technical language for studying this

Geometric deep learning: models

Graph neural networks

Geometric analogs of ConvNets
Most-common model by far

Idea:

Pixel-space convolution: discretization of Euclidean convolution
Replace Euclidean convolution with non-Euclidean analog
Discretize: obtain graph neural network architecture

Challenge: what is the right notion of convolution?

Geometric deep learning: models

Other models:

Attention and graph transformers
Invariant and equivariant networks
Lots of other ideas

Check out the recent book, available with slides and other content at https://geometricdeeplearning.com

Applications

of Geometric Learning

Applications

: drug discovery

Graph neural network: predicted antibiotic activity in halicin, prev. studied for diabetes

Shown in mice to have broad-spectrum antibiotic activity

Figure and results: Stokes et al. (Cell, 2020)

Applications: recommendation systems

Graph neural networks: used to improve complementary product recommendation

Improved performance in production compared to Amazon's prior approach

Figure and results: Hao et al. (CIKM 2020)

Applications: traffic prediction

Graph neural networks: Google Maps traffic ETA prediction

Improved accuracy in production compared to prior approach

Figure and results: Derrow-Pinion et al. (CIKM 2021)

UAI 2024 Tutorial

Geometric Probabilistic Models

Viacheslav (Slava) Borovitskiy

Alexander Terenin

UAI 2024 Tutorial

Geometric Probabilistic Models

Part II

Viacheslav (Slava) Borovitskiy

Part I: background, motivation, technical fundamentals

Model-based decision-making
Probabilistic modeling and uncertainty
Geometric learning: working with structured data

Part II: geometric probabilistic models

Three types of probabilistic models: an example problem
Deep dive: geometric Gaussian processes
Emerging applications
Software
Summary

Basic types of probabilistic models

Bayesian Neural Networks

Deep Ensembles

$\underbrace{\hphantom{\text{Bayesian Neural Networks}\qquad{Deep Ensembles}}}_{\text{require specialized architectures}}$

Gaussian Processes

$\underbrace{\hphantom{\text{Gaussian Processes}}}_{\text{require specialized kernels}}$

An example problem

San Jose highway network: graph with 1016 nodes

325 labeled nodes with known traffic speed in miles per hour

Use 250 labeled nodes for training data and 75 for test data

Dataset details: Borovitskiy et al. (AISTATS 2021)

Simple baseline

: prediction

: uncertainty

Trying out different probabilistic models

Link to code at the end of the tutorial

Sanity Check:

Deterministic Graph CNN

: prediction (1 layer)

: prediction (7 layers)

Probabilistic models

Deep Ensemble

: prediction

: uncertainty

Bayesian Graph CNN

: prediction (100 neurons/layer)

: uncertainty (100 neurons/layer)

: prediction (10 neurons/layer)

: uncertainty (10 neurons/layer)

Gaussian Process

: geometric Matérn kernel

– prediction

– uncertainty

Not all Gaussian processes
work well

Gaussian Process

: geometric Matérn kernel, uncertainty

: random walk kernel, uncertainty

: inverse cosine kernel, uncertainty

A way to think about this

This example: easy-to-use and visualize benchmark for regression on geometric data

Geometric Gaussian processes:

Reliable off-the-shelf probabilistic models
Solid baselines to benchmark other models against

Deep Dive:
Geometric Gaussian Processes

Geometric (non-Euclidean) domains

Graphs

road networks

Manifolds

physics, robotics

Spaces of graphs

drug design

Standard kernels in $\bb{R}^n$

Matérn kernels

$$ \htmlData{class=fragment fade-out,fragment-index=6}{ \footnotesize \mathclap{ k_{\nu, \kappa, \sigma^2}(x,x') = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \del{\sqrt{2\nu} \frac{\abs{x-x'}}{\kappa}}^\nu K_\nu \del{\sqrt{2\nu} \frac{\abs{x-x'}}{\kappa}} } } \htmlData{class=fragment d-print-none,fragment-index=6}{ \footnotesize \mathclap{ k_{\infty, \kappa, \sigma^2}(x,x') = \sigma^2 \exp\del{-\frac{\abs{x-x'}^2}{2\kappa^2}} } } $$

$\sigma^2$: variance

$\kappa$: length scale

$\nu$: smoothness

$\nu\to\infty$: RBF kernel (Gaussian, heat, diffusion)

$\nu = 1/2$

$\nu = 3/2$

$\nu = 5/2$

$\nu = \infty$

How to generalize these to non-Euclidean settings?

Distance-based approach

$$ k_{\infty, \kappa, \sigma^2}(x,x') = \sigma^2\exp\del{-\frac{|x - x'|^2}{2\kappa^2}} $$

$$ k_{\infty, \kappa, \sigma^2}^{(d)}(x,x') = \sigma^2\exp\del{-\frac{d(x,x')^2}{2\kappa^2}} $$

Geometry-aware, but...

Manifolds: not well-defined unless the manifold is isometric to a Euclidean space

Feragen et al. (CVPR 2015)

Graphs: not well-defined unless nodes can be isometrically embedded into a Hilbert space

Schoenberg (Trans. Am. Math. Soc. 1938)

Spaces of graphs: what is a space of graphs?

Need a different generalization

Another approach: stochastic partial differential equations

$$ \htmlData{class=fragment,fragment-index=0}{ \ubr{\del{\frac{2\nu}{\kappa^2} - \Delta}^{\frac{\nu}{2}+\frac{d}{4}} f = \c{W}}{\t{Matérn}} } $$

$\Delta$: Laplacian

$\c{W}$: Gaussian white noise

$d$: dimension

Generalizes well

Geometry-aware: respects to symmetries of the space

Implicit: no formula for the kernel

Whittle (ISI 1963)

Lindgren et al. (JRSSB 2011)

Making it explicit:
node sets of graphs

Motivating example: traffic speed modeling

Matérn kernels on weighted undirected graphs

SPDE turns into a stochastic linear system: solution has kernel $$ \htmlData{fragment-index=2,class=fragment}{ k_{\nu, \kappa, \sigma^2}(i, j) = \frac{\sigma^2}{C_{\nu, \kappa}} \sum_{n=0}^{\abs{V}-1} \Phi_{\nu, \kappa}(\lambda_n) \v{f_n}(i)\v{f_n}(j) } $$

$\lambda_n, \v{f_n}$: eigenvalues and eigenvectors of the Laplacian matrix

$$ \htmlData{fragment-index=4,class=fragment}{ \Phi_{\nu, \kappa}(\lambda) = \begin{cases} \htmlData{fragment-index=5,class=fragment}{ \del{\frac{2\nu}{\kappa^2} - \lambda}^{-\nu} } & \htmlData{fragment-index=5,class=fragment}{ \nu\lt\infty : \t{Matérn} } \\ \htmlData{fragment-index=6,class=fragment}{ e^{-\frac{\kappa^2}{2} \lambda} } & \htmlData{fragment-index=6,class=fragment}{ \nu = \infty : \t{heat (sq. exp., RBF)} } \end{cases} } $$

Examples: $k_{5/2}(\htmlStyle{color:rgb(242, 249, 88)!important;}{\bullet},\.)$ on node sets of different graphs

Making it explicit:
other settings

Making it explicit: other settings

Compact manifolds: $$ \htmlData{class=fragment}{ \begin{aligned} k_{\nu, \kappa, \sigma^2}(x,x') &= \frac{\sigma^2}{C_{\nu, \kappa}} \sum_{n=0}^\infty \Phi_{\nu, \kappa}(\lambda_n) f_n(x) f_n(x') \\ &= \frac{\sigma^2}{C_{\nu, \kappa}} \sum_{l=0}^\infty \Phi_{\nu, \kappa}(\lambda_l) \underbrace{\sum_{s=1}^{d_s} f_{l, s}(x) f_{l, s}(x')}_{G_l(x, x')} \end{aligned} } $$

Infinite summation where $\lambda_n, f_n$ are Laplace–Beltrami eigenpairs

Addition theorems: more efficient $G_l(x, x')$-based expressions

Making it explicit: other settings

Non-compact manifolds:

No summation: instead, integral approximated via Monte Carlo

Spaces of graphs:

Geometry: make a set of graphs to be a node set of a larger graph
Then, heat and Matérn kernels are automatically defined
Larger graph is huge: need clever computational techniques

Gaussian processes:

other geometric settings

Gaussian processes:

other geometric settings

Simple manifolds:
meshes, spheres, tori

Borovitskiy et al. (AISTATS 2020)

Compact Lie groups and
homogeneous spaces:
$\small\f{SO}(n)$, resp. $\small\f{Gr}(n, r)$

Azangulov et al. (JMLR 2024)

Non-compact symmetric
spaces: $\small\bb{H}_n$ and $\small\f{SPD}(n)$

Azangulov et al. (JMLR 2024)

Gaussian processes: other geometric settings

Vector fields on simple manifolds

Robert-Nicoud et al. (AISTATS 2024)

Gaussian processes: other geometric settings

Edges of graphs

Yang et al. (AISTATS 2024)

Metric graphs

Bolin et al. (Bernoulli 2024)

Cellular Complexes

Alain et al. (ICML 2024)

Spaces of Graphs

Borovitskiy et al. (AISTATS 2023)

Implicit geometry

Semi-supervised learning:
scalar functions

Fichera et al. (NeurIPS 2023)

Semi-supervised learning:
vector fields

Peach et al. (ICLR 2024)

Theoretical results

Hyperparameter inference

Li et al. (JMLR 2024)

Minimax errors

Rosa et al. (NeurIPS 2023)

Emerging real-world applications

Robotics: Bayesian optimization of control policies

Joint postures: $\mathbb{T}^d$

Rotations: $\mathbb{S}_3$, $\operatorname{SO}(3)$

Manipulability: $\operatorname{SPD(3)}$

Jaquier et al. (CoRL 2019, CoRL 2021), figures by the authors

Medicine: probabilistic atrial activation maps

Prediction

Uncertainty

Matérn Gaussian processes on meshes (independently proposed)

Coveney et al. (TBE 2019, PTRSA 2020), figures by the authors

Engineering: Bayesian optimization of shapes

Based on ensembles of geodesic CNNs

Neural Concept, video from neuralconcept.com

Join us!

Lots of worthwhile problems to study!

Software

Example Problem: Code

Stack:

PyTorch Geometric: graph neural networks
Pyro: probabilistic programming
GPyTorch: Gaussian processes
GeometricKernels: geometric Matérn kernels

Implementation:
Colab notebook

GeometricKernels: Matérn and heat kernels on geometric domains

https://geometric-kernels.github.io/

Matérn and Heat Kernels on Geometric Domains

Example code:

>>> # Define a space (2-dim sphere).
>>> hypersphere = Hypersphere(dim=2)

>>> # Initialize kernel.
>>> kernel = MaternGeometricKernel(hypersphere)
>>> params = kernel.init_params()

>>> # Compute and print out the 3x3 kernel matrix.
>>> xs = np.array([[0., 0., 1.], [0., 1., 0.], [1., 0., 0.]])
>>> print(kernel.K(params, xs))

[[1.    0.356     0.356]
	[0.356 1.        0.356]
	[0.356 0.356 1.       ]]

Multi-backend: Numpy, JAX, PyTorch, TensorFlow

Software and frameworks

	PyTorch	JAX	TensorFlow
Graph Neural Networks	PyTorch Geometric	Jraph	TensorFlow GNN
Probabilistic Programming	Pyro	NumPyro	TensorFlow Probability
Gaussian Processes	GPyTorch	GPJax	GPflow
Geometric Kernels	GeometricKernels

Neural networks beyond the graph setting: various small libraries

Summary

Geometric machine learning: increasingly popular

Waves: kernels on structured data, geometric deep learning

Geometric probabilistic models: emerging research area

Geometric Gaussian processes: solid off-the-shelf baselines
Increasingly available but actively developing: join us!

Fully Funded PhD in ML @ the University of Edinburgh

Starting 2025. Follow/contact @vabor112.

Thank you!

@vabor112 · @avt_im

Thank you!

@vabor112 · @avt_im

Slides at:

References

References: geometric deep learning and applications

M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. Available at: https://geometricdeeplearning.com, 2021.

J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann, V. M. Tran, A. Chiappino-Pepe, A. H. Badran, I. W. Andrews, E. J. Chory, G. M. Church, E. D. Brown, T. S. Jaakkola, R. Barzilay, and J. J. Collins. A deep learning approach to antibiotic discovery. Cell, 2020.

J. Hao, T. Zhao, J. Li, X. L. Dong, C. Faloutsos, Y. Sun, and W. Wang. P-companion: A principled framework for diversiﬁed complementary product recommendation. Conference on Information and Knowledge Management, 2020.

A. Derrow-Pinion, J. She, D. Wong, O. Lange, T. Hester, L. Perez, M. Nunkesser, S. Lee, X. Guo, B. Wiltshire, P. W. Battaglia, V. Gupta, A. Li, Z. Xu, A. Sanchez-Gonzalez, Y. Li, and P. Veličković. ETA prediction with graph neural networks in Google Maps. Conference on Information and Knowledge Management, 2021.

References: probabilistic neural networks in geometic settings

Y. Zhang, S. Pal, M. Coates, and D. Ustebay. Bayesian graph convolutional neural networks for semi-supervised classification. AAAI Conference on Artificial Intelligence, 2019.

M. Stadler, B. Charpentier, S. Geisler, D. Zügner, and S. Günnemann. Graph posterior network: Bayesian predictive uncertainty for node classification. Advances in Neural Information Processing Systems, 2021.

H. Shi, X. Zhang, S. Sun, L. Liu, and L. Tang. A survey on Bayesian graph neural networks. Intelligent Human-Machine Systems and Cybernetics, 2021.

A. Hasanzadeh, E. Hajiramezanali, S. Boluki, M. Zhou, N. Duﬃeld, K. Narayanan, and X. Qian. Bayesian graph neural networks with adaptive connection sampling. International Conference on Machine Learning, 2020.

N. Durasov, A. Lukoyanov, J. Donier, and P. Fua. DEBOSH: Deep bayesian shape optimization. Technical Report: Neural Concept, 2021.

Q. Wang, S. Wang, D. Zhuang, H. Koutsopoulos, and J. Zhao. Uncertainty quantiﬁcation of spatiotemporal travel demand with probabilistic graph neural networks. IEEE Transactions on Intelligent Transportation Systems, 2024.

L. Kahle and F. Zipoli. Quality of uncertainty estimates from neural network potential ensembles. Physical Review E, 2022.

References: geometric Gaussian processes

V. Borovitskiy, I. Azangulov, A. Terenin, P. Mostowsky, M. P. Deisenroth, and N. Durrande. Matérn Gaussian Processes on Graphs. Artificial Intelligence and Statistics, 2021.

A. Feragen, F. Lauze, and S. Hauberg. Geodesic exponential kernels: When curvature and linearity conﬂict. Computer Vision and Pattern Recognition, 2015.

I. J. Schoenberg. Metric spaces and positive deﬁnite functions. Transactions of the American Mathematical Society, 1938.

P. Whittle. Stochastic-processes in several dimensions. Bulletin of the International Statistical Institute, 1963.

F. Lindgren, H. Rue, and J. Lindström. An explicit link between Gaussian ﬁelds and Gaussian Markov random ﬁelds: the stochastic partial diﬀerential equation approach. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 2011.

V. Borovitskiy, A. Terenin, P. Mostowsky, and M. P. Deisenroth. Matérn Gaussian Processes on Riemannian Manifolds. Advances in Neural Information Processing Systems, 2020.

I. Azangulov, A. Smolensky, A. Terenin, and V. Borovitskiy. Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact case. Journal of Machine Learning Research, 2024.

I. Azangulov, A. Smolensky, A. Terenin, and V. Borovitskiy. Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces II: non-compact symmetric spaces, 2024.

D. Robert-Nicoud, A. Krause, and V. Borovitskiy. Intrinsic Gaussian Vector Fields on Manifolds. Artificial Intelligence and Statistics, 2024.

M. Yang, V. Borovitskiy, and E. Isuﬁ. Hodge-Compositional Edge Gaussian Processes. Artificial Intelligence and Statistics, 2024.

D. Bolin, A. B. Simas, and J. Wallin. Gaussian Whittle-Matérn fields on metric graphs. Bernoulli, 2024.

M. Alain, S. Takao, B. Paige, and M. P. Deisenroth. Gaussian Processes on Cellular Complexes. International Conference on Machine Learning, 2024.

References: geometric Gaussian processes

V. Borovitskiy, M. R. Karimi, V. R. Somnath, and A. Krause. Isotropic Gaussian Processes on Finite Spaces of Graphs. Artificial Intelligence and Statistics, 2023.

B. Fichera, V. Borovitskiy, A. Krause, and A. Billard. Implicit Manifold Gaussian Process Regression. Advances in Neural Information Processing Systems, 2023.

R. Peach, M. Vinao-Carl, N. Grossman, and M. David. Implicit Gaussian process representation of vector fields over arbitrary latent manifolds. International Conference on Learning Representations, 2024.

D. Li, W. Tang, and S. Banerjee. Inference for Gaussian Processes with Matern Covariogram on Compact Riemannian Manifolds. Journal of Machine Learning Research, 2023.

P. Rosa, V. Borovitskiy, A. Terenin, and J. Rousseau. Posterior Contraction Rates for Matérn Gaussian Processes on Riemannian Manifolds. Advances in Neural Information Processing Systems, 2023.

N. Jaquier, V. Borovitskiy, A. Smolensky, A. Terenin, T. Asfour, and L. Rozo. Geometry-aware Bayesian Optimization in Robotics using Riemannian Matérn Kernels. Conference on Robot Learning, 2021.

N. Jaquier, L. Rozo, S. Calinon, and M. Bürger. Bayesian optimization meets Riemannian manifolds in robot learning. Conference on Robot Learning, 2020.

S. Coveney, C. Corrado, C. H. Roney, R. D. Wilkinson, J. E. Oakley, F. Lindgren, S. E. Williams, M. D. O'Neill, S. A. Niederer, and R. H. Clayton. Probabilistic interpolation of uncertain local activation times on human atrial manifolds. IEEE Transactions on Biomedical Engineering, 2019.

S. Coveney, C. Corrado, C. H. Roney, D. OHare, S. E. Williams, M. D. O'Neill, S. A. Niederer, R. H. Clayton, J. E. Oakley, and R. D. Wilkinson. Gaussian process manifold interpolation for probabilistic atrial activation maps and uncertain conduction velocity. Philosophical Transactions of the Royal Society A, 2020.