Hi Marius,
When using Gaussian Process Regression (GPR) with a large dataset in MATLAB, you can employ the 'FitMethod', 'sd' option to fit the model using a subset of data points, known as the active set. This approach helps manage computational complexity by reducing the number of data points used in training. Here's a breakdown of your questions and the options available:ActiveSetMethod Options
- random: Selects data points randomly for the active set. This is the fastest option because it doesn't involve any optimization or criterion-based selection.
- sgma (Subset of Data using a Greedy Method for Approximation): Uses a greedy approach to select points that are most representative of the data distribution. This method is more computationally intensive than random selection but aims to choose a more informative subset.
- entropy: Selects points based on maximizing the differential entropy of the predictive distribution. This method tries to choose the most informative points and is computationally expensive, which explains the longer runtime compared to random selection.
- likelihood: Chooses points that maximize the marginal likelihood of the model. This method is also computationally intensive as it involves optimizing the likelihood function over subsets of the data.
