Kernel (Covariance) Function Options
In supervised learning, it is expected that the points with similar predictor values , naturally have close response (target) values . In Gaussian processes, the covariance function expresses this similarity [1]. It specifies the covariance between the two latent variables and , where both and are d-by-1 vectors. In other words, it determines how the response at one point is affected by responses at other points , i ≠ j, i = 1, 2, ..., n. The covariance function can be defined by various kernel functions. It can be parameterized in terms of the kernel parameters in vector . Hence, it is possible to express the covariance function as .
For many standard kernel functions, the kernel parameters are based on the signal standard deviation and the characteristic length scale . The characteristic length scales briefly define how far apart the input values can be for the response values to become uncorrelated. Both and need to be greater than 0, and this can be enforced by the unconstrained parametrization vector , such that
The built-in kernel (covariance) functions with same length scale for each predictor are:
Squared Exponential Kernel
This is one of the most commonly used covariance functions and is the default option for
fitrgp
. The squared exponential kernel function is defined aswhere is the characteristic length scale, and is the signal standard deviation.
Exponential Kernel
You can specify the exponential kernel function using the
'KernelFunction','exponential'
name-value pair argument. This covariance function is defined bywhere is the characteristic length scale and
is the Euclidean distance between and .
Matern 3/2
You can specify the Matern 3/2 kernel function using the
'KernelFunction','matern32'
name-value pair argument. This covariance function is defined bywhere
is the Euclidean distance between and .
Matern 5/2
You can specify the Matern 5/2 kernel function using the
'KernelFunction','matern52'
name-value pair argument. The Matern 5/2 covariance function is defined aswhere
is the Euclidean distance between and .
Rational Quadratic Kernel
You can specify the rational quadratic kernel function using the
'KernelFunction','rationalquadratic'
name-value pair argument. This covariance function is defined bywhere is the characteristic length scale, is a positive-valued scale-mixture parameter, and
is the Euclidean distance between and .
It is possible to use a separate length scale for each predictor m, m = 1, 2, ...,d. The built-in kernel (covariance) functions with a separate length scale for each predictor implement automatic relevance determination (ARD) [2]. The unconstrained parametrization in this case is
The built-in kernel (covariance) functions with separate length scale for each predictor are:
ARD Squared Exponential Kernel
You can specify this kernel function using the
'KernelFunction','ardsquaredexponential'
name-value pair argument. This covariance function is the squared exponential kernel function, with a separate length scale for each predictor. It is defined asARD Exponential Kernel
You can specify this kernel function using the
'KernelFunction','ardexponential'
name-value pair argument. This covariance function is the exponential kernel function, with a separate length scale for each predictor. It is defined aswhere
ARD Matern 3/2
You can specify this kernel function using the
'KernelFunction','ardmatern32'
name-value pair argument. This covariance function is the Matern 3/2 kernel function, with a different length scale for each predictor. It is defined aswhere
ARD Matern 5/2
You can specify this kernel function using the
'KernelFunction','ardmatern52'
name-value pair argument. This covariance function is the Matern 5/2 kernel function, with a different length scale for each predictor. It is defined aswhere
ARD Rational Quadratic Kernel
You can specify this kernel function using the
'KernelFunction','ardrationalquadratic'
name-value pair argument. This covariance function is the rational quadratic kernel function, with a separate length scale for each predictor. It is defined as
You can specify the kernel function using the KernelFunction
name-value pair argument in a call to fitrgp
. You can either
specify one of the built-in kernel parameter options, or specify a custom function. When
providing the initial kernel parameter values for a built-in kernel function, input the
initial values for signal standard deviation and the characteristic length scale(s) as a
numeric vector. When providing the initial kernel parameter values for a custom kernel
function, input the initial values the unconstrained parametrization vector . fitrgp
uses analytical derivatives to
estimate parameters when using a built-in kernel function, whereas when using a custom
kernel function it uses numerical derivatives.
References
[1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.
[2] Neal, R. M. Bayesian Learning for Neural Networks. Springer, New York. Lecture Notes in Statistics, 118, 1996.