Formatting input data for linear regression model in leave-out-one validation testing

Question

Isabelle Museck 2024-7-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2140171-formatting-input-data-for-linear-regression-model-in-leave-out-one-validation-testing

回答： Gayathri 2024-8-8

Hello there I have data from 10 trials stored in a 10x1 cell (Predictors) and the corespoding respose vairables stored in a 10x1 cell (Response). I am trying to trian a simple linear regression model and make predictions by leaving one trial out and using the other 9 trials to train the linear regression model and the one to predict/test the model by producing RMSE values. I am unsure of how to format my input within the "fitlm" function as I keep getting the follwing error:

% Train the network
for i = 1:length(Predictors) %iterate over all data points
    validationdataX = Predictors(i);
    validationdataY = Response(i);
%Exclude the current index (i) for training
    trainingIndices = setdiff(1:length(Predictors),i);
    traningdataX = Predictors(trainingIndices)
    trainingdataY = Response(trainingIndices)
net = fitlm(traningdataX,trainingdataY)
ypred = predict(net,validationdataX);
TrueVal = validationdataY;
TrueValue = cell2mat(TrueVal);
Predvalue = {Predval};
PredictedValue = cell2mat(Predvalue);
RMSE = rmse(PredictedValue,TrueValue)
end
Error using classreg.regr.TermsRegression/handleDataArgs (line 589)
Predictor variables must be numeric vectors, numeric matrices, or categorical vectors.
Error in LinearModel.fit (line 1000)
            [X,y,haveDataset,otherArgs] = LinearModel.handleDataArgs(X,paramNames,varargin{:});
Error in fitlm (line 134)
model = LinearModel.fit(X,varargin{:});

Any suggestions on how to fix this and to get the model to work correcly and make predictions using leave out one validation approach would be greatly appreciated!

9 个评论
显示 7更早的评论隐藏 7更早的评论

Umar 2024-7-25

Hi Isabelle,

Sounds like interesting project. In your code, you are passing cell arrays as predictors, which is causing the error.To resolve this issue, you need to convert your cell arrays to numeric arrays before fitting the linear model. However, I did update the code including leave out one validation approach. Here is updated code snippet example,

% Define and populate sample data for 'data' and 'responseData'

data = {1, 2, 3, 4, 5}; % Sample predictor data

responseData = {10, 20, 30, 40, 50}; % Sample response data

% Define and populate the 'Predictors' variable with sample data

Predictors = cell(1, length(data));

for i = 1:length(data)

    Predictors{i} = data{i};

end

% Define and populate the 'Response' variable with sample data

Response = cell(1, length(responseData));

for i = 1:length(responseData) Response{i} = responseData{i}; end

% Train the linear regression model with leave-one-out cross-validation

for i = 1:length(Predictors)

    % Extract validation data for the current iteration

    validationdataX = cell2mat(Predictors(i));

    validationdataY = cell2mat(Response(i));

    % Exclude the current index (i) for training

    trainingIndices = setdiff(1:length(Predictors), i);

    trainingdataX = cell2mat(Predictors(trainingIndices));

    trainingdataY = cell2mat(Response(trainingIndices));

    % Train the linear regression model

    mdl = fitlm(trainingdataX, trainingdataY);

    % Make predictions on the validation data

    ypred = predict(mdl, validationdataX);

    % Calculate RMSE for the current iteration

    RMSE = sqrt(mean((ypred - validationdataY).^2));

    % Display RMSE for each iteration

    disp(['RMSE for iteration ', num2str(i), ': ', num2str(RMSE)]);

 Hope, this is what you are looking for. Please see attached results.

Please let me know if you have any further questions.

Umar 2024-7-25

Hi Isabelle,

This is a common issue when working with regression models. In your case, the training data matrix trainingdataX is of size 567x541, while trainingdataY is of size 9x541. This discrepancy in dimensions is causing the error as the number of observations (rows) in the predictor matrix should match the number of observations in the response matrix. To resolve this issue, you need to ensure that the predictor and response matrices have the same number of observations. One way to address this is by transposing the trainingdataY matrix to match the number of observations in trainingdataX. Here is an example illustrating this:

% Example to transpose trainingdataY to match the number of observations in trainingdataX

trainingdataX = randn(567, 541); % Example predictor matrix

trainingdataY = randn(9, 541); % Example response matrix

% Transpose trainingdataY to match the number of observations in trainingdataX

trainingdataY = trainingdataY'; % Transpose the matrix

% Verify the dimensions after transposing

size(trainingdataX)

size(trainingdataY)

By transposing the trainingdataY matrix, you align the number of observations with trainingdataX, resolving the mismatch in dimensions and addressing the error you encountered. Please let me know if this helps resolving your problem.

Isabelle Museck 2024-7-29

Hi, Umar I appreciate your response. I am interested in comparing a simple lienear regression model to other models that I have built and in order to compare these fairly I want to keep the number of features the same regardless of the chanllenges of overfitting, compuational complexity, and nosie. Could you guide me in how I can achive this within my code? How can I input the predicotr data with 63 featuresx541 timesteps from the 9 trials and the respose data with 1 response variablex541 timesteps from the 9 trials into a linear model without getting the errors from the dimensions not being equivalent?

Umar 2024-7-29

Hi @Isabelle Museck,

To input the predictor and response data into a linear model without dimension mismatch errors, you have to make sure that the dimensions of the data align correctly. In the provided code snippet, you can modify the data handling part as follows:

% Train the network

for i = 1:length(Predictors) % iterate over all data points

    validationdataX = Predictors(:, i); % Use all features for the current timestep

    validationdataY = Response(:, i); % Use the response variable for the current

timestep

    % Exclude the current index (i) for training

    trainingIndices = setdiff(1:length(Predictors), i);

    trainingdataX = Predictors(:, trainingIndices); % Use all features for training data

    trainingdataY = Response(:, trainingIndices); % Use response variable for training data

    net = fitlm(trainingdataX', trainingdataY'); % Fit linear model

    ypred = predict(net, validationdataX'); % Predict using the model

    TrueValue = validationdataY';

    PredictedValue = ypred';

    RMSE = rmse(PredictedValue, TrueValue); % Calculate RMSE

end

Please bear in mind that this is example code snippet and you have to customize this code based on your preferences. Please let me know if you have any further questions.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Gayathri 2024-8-8

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2140171-formatting-input-data-for-linear-regression-model-in-leave-out-one-validation-testing#answer_1496394

在 MATLAB Online 中打开

Hi @Isabelle Museck,

I have implemented the codes in MATLAB R2024a. I can see that the issue of taking cell array as input has been resolved in the comment section. By reading the comments, I get to know that the new issue is passing the data into “fitlm" function without reducing any features by dimensionality reduction.

I am understanding that your data has nine arrays of dimension 63x541arrays and have corresponding responses which are of dimension 1x541. As the issue of taking input as cell array has already been solved in the comments, I am taking “Predictors” and “Responses” to be two random matrices drawn from normal distribution as input data. I am passing data of size 63x541 into “fitlm” function, response of which is a numeric vector of size 1x541. This approach could be used for fitting the data as it is, without using any dimensionality reduction techniques.

Please see the below code for your reference.

Predictors =  randn(567, 541); 
Response =  randn(9, 541); 
for i = 0:8 % iterate over all data points
    validationdataX = Predictors(63*i+1:63*(i+1),:);
    validationdataY = Response(i+1,:); 
    Predictors1=Predictors;
    Predictors1(63*i+1:63*(i+1),:)=[];
    trainingdataX = Predictors1;
    Response1=Response;
    Response1(i+1,:)=[];
    trainingdataY=Response1;
    for j=0:7
        model = fitlm(trainingdataX((63*j)+1:63*(j+1),:)', trainingdataY(j+1,:)'); 
    end
    ypred = predict(model, validationdataX'); 
    TrueValue = validationdataY';
    
    RMSE = rmse(ypred, TrueValue) 
end

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Formatting input data for linear regression model in leave-out-one validation testing

9 个评论
显示 7更早的评论隐藏 7更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Formatting input data for linear regression model in leave-out-one validation testing

9 个评论 显示 7更早的评论隐藏 7更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

9 个评论
显示 7更早的评论隐藏 7更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论