Gaussian Process Regression not working with normalized data or noise

11 次查看(过去 30 天)
I have data with around features that I am trying to fit regression models to. The data is a series of vectors that I have in three forms: noiseless vectors, noiseless unit vectors, and noisy unit vectors. The noise applied to the vectors is gaussian white noise.
Most models (bag ensemble, svm, gam, logitboost ensemble), perform best with the noiseless unit vector data and then slightly worse with the noisy or non unit vector data. The GPR model, on the other hand, performs the best of all the models on the noiselss non-normalized data but then completely fails with the unit vector data (it predicts everything as the same value) and mostly fails with the noisy data (it predicts almsot all test values as the same number with a few exceptions that it predicts perfectly).
Here is example code for the normalized data (all data sets are processed the same way the data is just different):
%% Normal Data
clc; clearvars;
load threeEmbedData.mat
% turns the data and labels into 2d matrices
ylabels = reshape(yLabels,size(yLabels,1)*size(yLabels,2),3);
x = xGen(svData);
x(isnan(x)) = 0;
% splits data into test and validation sets
index = randperm(length(ylabels));
percent = .8;
final = round(percent*length(ylabels));
predictors = 1000;
train = x(1:predictors,index(1:final));
test = x(1:predictors,index(final+1:end));
Ytrain = ylabels(index(1:final),2);
Ytest = ylabels(index(1+final:end),2);
train = train';
test = test';
% creating the model
mdls{1} = fitrgp(train,Ytrain);
% sorting predicted test values and comparing them to actual test values
Y = predict(mdls{1},test);
[~,I] = sort(Ytest);
figure
scatter(1:size(Ytest,1),[Ytest(I) Y(I)])
% function for reshaping sensitivity vector fields and returning the
% reshaped values for embeded data
function [X] = xGen(svVec)
dim = [size(svVec)];
X1 = zeros(dim(1)*dim(2),dim(3));
X = zeros(dim(1)*dim(2),dim(3)*dim(4));
for l = 1:dim(4)
for k = 1:dim(3)
X1(:,k) = reshape(svVec(:,:,k,l)',[dim(1)*dim(2) 1]);
end
X(:,dim(3)*l-dim(3)+1:dim(3)*l) = X1;
end
end
This data generation works great but the data with nosie and the normnalzied data do not work at all. I have attached the results of all three below. My data files are all around 60 Mb so I cannot attach them but if anyone knows a workaroudn I would love to attach them for easier troubleshooting.

回答(1 个)

Neha
Neha 2023-6-27
Hi Alejandro,
I understand that the GPR model is not performing well for noisy data and unit vector data. I suggest you optimize the hyperparameters, especially the kernel function by changing it to matern32 or matern52. If this doesn't improve the performance, you can experiment with different kernel choices and compare their performance on a validation set. You can fit the GPR model with different kernels and evaluate their predictive accuracy or goodness-of-fit metrics. Choose the kernel that provides the best performance according to your evaluation criteria.
Similarly other hyperparameters can also be optimized through grid search.
I hope this helps!

类别

Help CenterFile Exchange 中查找有关 Gaussian Process Regression 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by