HEC-DSA, Université de Lausanne
Gaussian Processes
in Machine Learning
Viacheslav Borovitskiy (Slava)

Definition. A Gaussian process is random function f:X×Ω→R such that for any x1,..,xn, the vector f(x1),..,f(xn) is multivariate Gaussian.
The distribution of a Gaussian process is characterized by
Notation: f∼GP(m,k).
The kernel k must be positive (semi-)definite, i.e. for all x1,..,xn∈X
the matrix Kxx:={k(xi,xj)}1≤i≤n1≤j≤n must be positive (semi-)definite.
Takes
giving the posterior (conditional) Gaussian process GP(m^,k^).
The functions m^ and k^ may be explicitly expressed in terms of m and k.
Goal: minimize unknown function ϕ in as few evaluations as possible.
Also
kν,κ,σ2(x,x′)=σ2Γ(ν)21−ν(2νκ∥x−x′∥)νKν(2νκ∥x−x′∥)k∞,κ,σ2(x,x′)=σ2exp(−2κ2∥x−x′∥2)
σ2: variance
κ: length scale
ν: smoothness
ν→∞: Gaussian kernel (RBF)
ν=1/2
ν=3/2
ν=5/2
ν=∞
k∞,κ,σ2(dg)(x,x′)=σ2exp(−2κ2dg(x,x′)2)
Theorem. (Feragen et al.) Let M be a complete Riemannian manifold without boundary. If k∞,κ,σ2(dg) is positive semi-definite for all κ, then M is isometric to a Euclidean space.
Mateˊrn(κ22ν−Δ)2ν+4df=W Δ: Laplacian W: Gaussian white noise
Define Sobolev spaceHs(M)=Bessel potential(1−Δ)−2sL2(M)
Then the solution of (κ22ν−Δ)2ν+4df=W
At least the solution of (1−Δ)2ν+4df=W
May be regarded as the isonormal proces on Hν+d/2.
It has the reproducing kernel of Hν+d/2 as its kernel.
The solution is a Gaussian process with kernel kν,κ,σ2(x,x′)=Cνσ2n=0∑∞(κ22ν−λn)−ν−2dfn(x)fn(x′)
The solution is a Gaussian process with kernel kν,κ,σ2(i,j)=Cνσ2n=0∑∣V∣−1(κ22ν+λn)−νfn(i)fn(j)
x0x1=x0+f(x0)Δtx2=x1+f(x1)Δt..
Interpolation: f∣f(x)=y
m^=f∈Hkargmin∥f∥Hk subject to f(xi)=yi
(m^(x)−ftrue(x))2≤∥ftrue∥Hk2k^(x,x)
Regression: f∣f(x)+ε=y where ε∼N(0,σn2I)
m^=f∈Hkargmini=1∑n(f(xi)−yi)2+σn2∥f∥Hk
(m^(x)−ftrue(x))2≤∥ftrue∥Hkσ2(k^(x,x)+σn2)
Here Hkσ=Hk+Hσn2δ
and x=xi,i=1,…,n
Moments m^(x) and k^(x,x) of the posterior Gaussian process
may be expressed in terms of the RKHS Hk.
Consider a covariance k living on a compact space X with a finite measure ν.
A generalized Driscol’s Zero-One Law. Consider f∼GP(0,k) and r such that Hk⊆Hr. Let Ikr:Hk→Hr be the natural inclusion operator.
Thus f∈Hk with probability 0, however...
For any 0<θ<1 we have with probability 1: f∈(Hk,L2(X,ν))θ,2.
Own works:
V. Borovitskiy, A. Terenin, P. Mostowsky, M. P. Deisenroth. Matérn Gaussian Processes on Riemannian Manifolds.
In Neural Information Processing Systems (NeurIPS) 2020.
V. Borovitskiy, I. Azangulov, A. Terenin, P. Mostowsky, M. P. Deisenroth. Matérn Gaussian Processes on Graphs.
In International Conference on Artificial Intelligence and Statistics (AISTATS) 2021.
M. Hutchinson, A. Terenin, V. Borovitskiy, S. Takao, Y. W. Teh, M. P. Deisenroth. Vector-valued Gaussian Processes on Riemannian Manifolds via Gauge-Independent Projected Kernels. To appear in NeurIPS 2021.
Other works:
M. Kanagawa, P. Hennig, D. Sejdinovic, B. K. Sriperumbudur. Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences. arXiv preprint arXiv:1807.02582, 2018.
I. Steinwart. Convergence Types and Rates in Generic Karhunen-Loeve Expansions with Applications to Sample Path Properties. Potential Analysis, 2018.
F. Lindgren, H. Rue, J. Lindström. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B, 2011.
A. Feragen, F. Lauze, S. Hauberg. Geodesic exponential kernels: When curvature and linearity conflict. Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
P. Whittle. On Stationary Processes in the Plane. Biometrika, 1954.
viacheslav.borovitskiy@gmail.com https://vab.im