Fully Independent Conditional Approximation for GPR Models

The fully independent conditional (FIC) approximation [1] is a way of systematically approximating the true GPR kernel function in a way that avoids the predictive variance problem of the SR approximation while still maintaining a valid Gaussian process. You can specify the FIC method for parameter estimation by using the 'FitMethod','fic' name-value pair argument in the call to fitrgp. For prediction using FIC, you can use the 'PredictMethod','fic' name-value pair argument in the call to fitrgp.

Approximating the Kernel Function

The FIC approximation to $k (x_{i}, x_{j} | θ)$ for active set $A \subset N = {1, 2, ..., n}$ is given by:

$\begin{array}{l} {\hat{k}}_{F I C} (x_{i}, x_{j} | θ, A) = {\hat{k}}_{S R} (x_{i}, x_{j} | θ, A) + δ_{i j} (k (x_{i}, x_{j} | θ) - {\hat{k}}_{S R} (x_{i}, x_{j} | θ, A)), \\ δ_{i j} = {\begin{array}{l} 1, & if i = j, \\ 0 & if i \neq j . \end{array} \end{array}$

That is, the FIC approximation is equal to the SR approximation if $i \neq j$ . For $i = j$ , the software uses the exact kernel value rather than an approximation. Define an n-by-n diagonal matrix $Ω (X | θ, A)$ as follows:

$\begin{array}{l} {[Ω (X | θ, A)]}_{i j} & = δ_{i j} (k (x_{i}, x_{j} | θ) - {\hat{k}}_{S R} (x_{i}, x_{j} | θ, A)) \\ = {\begin{array}{l} k (x_{i}, x_{j} | θ) - {\hat{k}}_{S R} (x_{i}, x_{j} | θ, A) & if i = j, \\ 0 & if i \neq j . \end{array} \end{array}$

The FIC approximation to $K (X, X | θ)$ is then given by:

$\begin{array}{l} {\hat{K}}_{F I C} (X, X | θ, A) & = {\hat{K}}_{S R} (X, X | θ, A) + Ω (X | θ, A) \\ = K (X, X_{A} | θ) K {(X_{A}, X_{A} | θ)}^{- 1} K (X_{A}, X | θ) + Ω (X | θ, A) . \end{array}$

Parameter Estimation

Replacing $K (X, X | θ)$ by ${\hat{K}}_{F I C} (X, X | θ, A)$ in the marginal log likelihood function produces its FIC approximation:

$\begin{array}{l} \log P_{F I C} (y | X, β, θ, σ^{2}, A) = & - \frac{1}{2} {(y - H β)}^{T} {[{\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{n}]}^{- 1} (y - H β) \\ - \frac{N}{2} \log 2 π - \frac{1}{2} \log | {\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{n} | . \end{array}$

As in the exact method, the software estimates the parameters by first computing $\hat{β} (θ, σ^{2})$ , the optimal estimate of $β$ , given $θ$ and $σ^{2}$ . Then it estimates $θ$ , and $σ^{2}$ using the $β$ -profiled marginal log likelihood. The FIC estimate to $β$ for given $θ$ , and $σ^{2}$ is

${\hat{β}}_{F I C} (θ, σ^{2}, A) = {[\underset{*}{\underset{︸}{H^{T} {({\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N})}^{- 1} H}}]}^{- 1} \underset{* *}{\underset{︸}{H^{T} {({\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N})}^{- 1} y}},$

$\begin{array}{l} * = H^{T} Λ {(θ, σ^{2}, A)}^{- 1} H - H^{T} Λ {(θ, σ^{2}, A)}^{- 1} K (X, X_{A} | θ) B_{A}^{- 1} K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1} H, \\ * * = H^{T} Λ {(θ, σ^{2}, A)}^{- 1} y - H^{T} Λ {(θ, σ^{2}, A)}^{- 1} K (X, X_{A} | θ) B_{A}^{- 1} K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1} y, \\ B_{A} = K (X_{A}, X_{A} | θ) + K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1} K (X, X_{A} | θ), \\ Λ (θ, σ^{2}, A) = Ω (X | θ, A) + σ^{2} I_{n} . \end{array}$

Using ${\hat{β}}_{F I C} (θ, σ^{2}, A)$ , the $β$ -profiled marginal log likelihood for FIC approximation is:

$\begin{array}{l} \log P_{F I C} (y | X, {\hat{β}}_{F I C} (θ, σ^{2}, A), θ, σ^{2}, A) = \\ \begin{array}{l} - \frac{1}{2} {(y - H {\hat{β}}_{F I C} (θ, σ^{2}, A))}^{T} {({\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N})}^{- 1} (y - H {\hat{β}}_{F I C} (θ, σ^{2}, A)) \\ - \frac{N}{2} \log 2 π - \frac{1}{2} \log | {\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N} |, \end{array} \end{array}$

where

$\begin{array}{l} {({\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N})}^{- 1} \\ = Λ {(θ, σ^{2}, A)}^{- 1} - Λ {(θ, σ^{2}, A)}^{- 1} K (X, X_{A} | θ) B_{A}^{- 1} K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1}, \\ \log | {\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N} | = \log | Λ (θ, σ^{2}, A) | + \log | B_{A} | - \log | K (X_{A}, X_{A} | θ) | . \end{array}$

Prediction

The FIC approximation to the distribution of $y_{n e w}$ given $y$ , $X$ , $x_{n e w}$ is

$\begin{array}{l} P (y_{n e w} | y, X, x_{n e w}) & = N (y_{n e w} | h {(x_{n e w})}^{T} β + μ_{F I C}, σ_{n e w}^{2} + Σ_{F I C}) \end{array},$

where $μ_{F I C}$ and $Σ_{F I C}$ are the FIC approximations to $μ$ and $Σ$ given in prediction using exact GPR method. As in the SR case, $μ_{F I C}$ and $Σ_{F I C}$ are obtained by replacing all occurrences of the true kernel with its FIC approximation. The final forms of $μ_{F I C}$ and $Σ_{F I C}$ are as follows:

$μ_{F I C} = K (x_{n e w}^{T}, X_{A} | θ) B_{A}^{- 1} K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1} (y - H β),$

$\begin{array}{l} Σ_{F I C} & = k (x_{n e w}, x_{n e w} | θ) - K (x_{n e w}^{T}, X_{A} | θ) K {(X_{A}, X_{A} | θ)}^{- 1} K (X_{A}, x_{n e w}^{T} | θ) \\ + K (x_{n e w}^{T}, X_{A} | θ) B_{A}^{- 1} K (X_{A}, x_{n e w}^{T} | θ), \end{array}$

where

$\begin{array}{l} B_{A} = K (X_{A}, X_{A} | θ) + K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1} K (X, X_{A} | θ), \\ Λ (θ, σ^{2}, A) = Ω (X | θ, A) + σ^{2} I_{n} . \end{array}$

References

[1] Candela, J. Q. "A Unifying View of Sparse Approximate Gaussian Process Regression." Journal of Machine Learning Research. Vol 6, pp. 1939–1959, 2005.

Fully Independent Conditional Approximation for GPR Models

Approximating the Kernel Function

Parameter Estimation

Prediction

References

See Also

Topics