How to specify a portion of dataset for cross-validation with fitrgp?

Question

Katy 2023-9-14

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2020941-how-to-specify-a-portion-of-dataset-for-cross-validation-with-fitrgp

回答： Katy 2023-9-29

I am using fitrgp and would like to do cross-validation using a predetermined dataset as the valiadtion data (I have one dataset for training, and another one for validation). I've read the documentation below and similar questions on this forum, but I haven't seen a way that this is possible. Alternatively, is there a way to specify the indices of one dataset to indicate the training portion and the validation portion?

fitrgp documentation

cvpartition documentation

Any help is appreciated, thanks!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Katy 2023-9-29

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2020941-how-to-specify-a-portion-of-dataset-for-cross-validation-with-fitrgp#answer_1322044

It turns out custom cross-validation partitioning is a feature available in R2023b. I was able to specify the test indices similar to this example.

https://www.mathworks.com/help/releases/R2023b/stats/cvpartition.html#mw_cbfe0131-6ee0-499c-bed3-c083dd22d047

Thanks to the Mathworks Technical Support team as well for the help!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Maneet Kaur Bagga 2023-9-26

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2020941-how-to-specify-a-portion-of-dataset-for-cross-validation-with-fitrgp#answer_1318987

Hi Katy,

As per my understanding to perform cross-validation using a predetermined dataset as the validation data with "fitrgp", "cvpartition" function can be used to create a custom partition object. This allows to specify the indices of the training and validation portions.
For instance, "cvpartition" can be used to create a hold-out validation partition object. The "numObservations" parameter is set to the number of observations in the training dataset. The "HoldOut" method is used, and the size of the validation dataset (X_val) is specified.
The training and test methods of the partition object can then be used to obtain the indices for the training and validation portions, respectively. These indices are used to select the corresponding data from the training dataset (X_train and Y_train).
Finally, the "fitrgp" function can be used to train the GP model using the training data, and the "predict" function is used to obtain the predictions on the validation data (X_val_cv). Then calculate performance metrics, such as mean squared error or R-squared, using the predicted values (Y_val_pred) and the actual validation targets (Y_val_cv).

Please refer to the following documentation for better understanding of the functions:

fitrgp

https://www.mathworks.com/help/stats/fitrgp.html#d126e462217

cvpartition

https://www.mathworks.com/help/stats/cvpartition.html

predict

https://www.mathworks.com/help/stats/linearmodel.predict.html

Hope this helps!

Thank You

Maneet Bagga

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Katy 2023-9-27

在 MATLAB Online 中打开

Hi Maneet,

Thank you for this really detailed response! Just to follow-up on this point:

The training and test methods of the partition object can then be used to obtain the indices for the training and validation portions, respectively. These indices are used to select the corresponding data from the training dataset (X_train and Y_train).

Using this cvpartition holdout method, based on my understanding, the indices are then selected randomly by the cvpartition object even if using the number of observations in the test set rather than the fraction.

I referred to this example:

openExample('stats/EstimateNewDataClassificationUsingCrossValidationErrorExample')

and experimented with changing this line:

hpartition = cvpartition(n,'Holdout',0.3)

to an integer (5 for example below)

hpartition = cvpartition(n,'Holdout',5)

From this it seems that the indices in 'idxTrain' and 'idxNew' variables are randomly selected.

I'm hoping to find a way to manually indicate which indices to select as the training set, and which indices to select as the validation set. (i.e. idxTrain = tbl(1:50, :) and idxTest = tbl(1:15, :) for example)

Thank you again for your response!

请先登录，再进行评论。

How to specify a portion of dataset for cross-validation with fitrgp?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

How to specify a portion of dataset for cross-validation with fitrgp?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论