Understanding Gaussian Process Regression in Regression Learner App

Question

Georgi 2024-8-3

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2142666-understanding-gaussian-process-regression-in-regression-learner-app

回答： Image Analyst 2024-8-3

Hi everyone,

I'm having some trouble understanding Gaussian Process Regression (GPR) options in the Regression Learner App. There are three main choices for GPR models:

Predefined Kernel: I can directly choose a kernel (Rational Quadratic, Squared Exponential, Matern 5/2, or Exponential) if I know which one suits my data best.
All GPR Models (non-optimizable): If I’m unsure which kernel to use, I can select this option to try all non-optimizable GPR models.
Optimizable GPR: This option allows the hyperparameters to be optimized and has even more Kernels available.

Regardless of whether I choose the optimizable or non-optimizable version, each kernel has hyperparameters that can be tuned.

Here are my questions:

Automatic Hyperparameter Selection: For the standard model without optimization, the kernel parameters (hyperparameters) are automatically selected and are an initial best guess rather than the optimal ones to my understanding. If so, how are they estimated/selected, and why aren't they optimal?
Optimization Process: When the optimization option is selected, the hyperparameters are found by maximizing the Log Marginal Likelihood function. So, they are not guessed, but these are the hyperparameters that best describe the data. So finding the best value is basically the optimization process?
Choosing Non-Optimal Hyperparameters: Why would someone use the version without the best hyperparameters? One reason I've encountered is that optimization can take too long with the same data. Are there other reasons to choose non-optimizable models?
Best Practices: Are there standard practices or guidelines for when to use each version of the GPR models? Or one usually starts with non-optimizable version and then go to the optimizable one if the results are not sufficient enough?
Output Differences: Both versions output forecasted data and the covariance matrix. Does the optimized version provide any additional information or benefits?
Plotting Covariance: How can I plot the covariance that quantifies the uncertainty of my predictions? This is one of the main advantages of using GPR, and I want to visualize it. I have found some sample code here https://ch.mathworks.com/help/stats/gaussian-process-regression-models.html But I am a bit confused, because my input is a table actually, so I do not have only one predictor but a table. So far I have always used “predictedData = trainedModel.predictFcn(Table); but this gives me only the forecasted data without the 95% prediction intervals

I'm generally confused about the differences between both versions and the results they produce. Any insights or explanations would be greatly appreciated.

Thanks in advance!

Best regards,

Georgi

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Image Analyst 2024-8-3

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2142666-understanding-gaussian-process-regression-in-regression-learner-app#answer_1494261

I just always say to try all models and then pick the best one. I just use the default parameters and don't try to fine tune them for better performance, but you're welcome to.

When you specify your input parameters (predictors) on that tab you can select which columns from your table are predictor variables. For the response variable, you can either have a separate column vector, or it can be one of the columns in your table.

I don't remember all the visualization options but after the modeling finished, you can click on one of the models and tell it to show a scatterplot of your predictions vs. your true values. That's what I always look at. Tell it to sort your results by some metric like RMS or MSE and pick the model that gave you the lowest value.

If you want to upload your data in a .mat file, I could experiment with it myself.