Formatting input data for linear regression model in leave-out-one validation testing

44 次查看(过去 30 天)
Hello there I have data from 10 trials stored in a 10x1 cell (Predictors) and the corespoding respose vairables stored in a 10x1 cell (Response). I am trying to trian a simple linear regression model and make predictions by leaving one trial out and using the other 9 trials to train the linear regression model and the one to predict/test the model by producing RMSE values. I am unsure of how to format my input within the "fitlm" function as I keep getting the follwing error:
% Train the network
for i = 1:length(Predictors) %iterate over all data points
validationdataX = Predictors(i);
validationdataY = Response(i);
%Exclude the current index (i) for training
trainingIndices = setdiff(1:length(Predictors),i);
traningdataX = Predictors(trainingIndices)
trainingdataY = Response(trainingIndices)
net = fitlm(traningdataX,trainingdataY)
ypred = predict(net,validationdataX);
TrueVal = validationdataY;
TrueValue = cell2mat(TrueVal);
Predvalue = {Predval};
PredictedValue = cell2mat(Predvalue);
RMSE = rmse(PredictedValue,TrueValue)
end
Error using classreg.regr.TermsRegression/handleDataArgs (line 589)
Predictor variables must be numeric vectors, numeric matrices, or categorical vectors.
Error in LinearModel.fit (line 1000)
[X,y,haveDataset,otherArgs] = LinearModel.handleDataArgs(X,paramNames,varargin{:});
Error in fitlm (line 134)
model = LinearModel.fit(X,varargin{:});
Any suggestions on how to fix this and to get the model to work correcly and make predictions using leave out one validation approach would be greatly appreciated!
  5 个评论
Isabelle Museck
Isabelle Museck 2024-7-26,11:57
Hi Umar,
Thank you so much for your response. Unfortunaltey that did not fix the isse however I think it is due to the fact that there are 63 input features that I am trying to use to predict the one contunuous variable as you can see here in the predictors and response cells:
This may be causing the differnces in dimensions seen here in the training X and Y data:
Is there a way to account for this in the code and use the 63 featuresx541 time steps for each trial to properly train and predict the continous variable in the response cell and obtain ans rmse value for each trial?
Umar
Umar about 12 hours 前
Hi Isabelle,
Try considering implementing dimensionality reduction techniques such as Principal Component Analysis (PCA) or feature selection methods to reduce the number of input features while retaining relevant information. This can help in aligning the dimensions of your input data with the response variable, potentially improving the accuracy of your predictions. Additionally, you can reshape your input data to match the desired format of 63 features x 541 time steps for each trial to ensure that the model receives the correct input dimensions during training and prediction. After making these adjustments, you can train your model using the modified data and evaluate its performance by calculating the root mean squared error (RMSE) for each trial. The RMSE value will provide insight into how well your model is predicting the continuous variable based on the input features. Hope this answers your question. Please let me know if you have any further questions.

请先登录,再进行评论。

回答(0 个)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by