Documentation

# kfoldPredict

Predict responses for observations not used for training

## Syntax

``YHat = kfoldPredict(CVMdl)``

## Description

example

````YHat = kfoldPredict(CVMdl)` returns cross-validated predicted responses by the cross-validated linear regression model `CVMdl`. That is, for every fold, `kfoldPredict` predicts responses for observations that it holds out when it trains using all other observations.`YHat` contains predicted responses for each regularization strength in the linear regression models that compose `CVMdl`.```

## Input Arguments

expand all

Cross-validated, linear regression model, specified as a `RegressionPartitionedLinear` model object. You can create a `RegressionPartitionedLinear` model using `fitrlinear` and specifying any of the one of the cross-validation, name-value pair arguments, for example, `CrossVal`.

To obtain estimates, kfoldPredict applies the same data used to cross-validate the linear regression model (`X` and `Y`).

## Output Arguments

expand all

Cross-validated predicted responses, returned as an n-by-L numeric array. n is the number of observations in the predictor data that created `CVMdl` (see `X`) and L is the number of regularization strengths in `CVMdl.Trained{1}.Lambda`. `YHat(i,j)` is the predicted response for observation `i` using the linear regression model that has regularization strength `CVMdl.Trained{1}.Lambda(j)`.

The predicted response using the model with regularization strength j is ${\stackrel{^}{y}}_{j}=x{\beta }_{j}+{b}_{j}.$

• x is an observation from the predictor data matrix `X`, and is row vector.

• ${\beta }_{j}$ is the estimated column vector of coefficients. The software stores this vector in `Mdl.Beta(:,j)`.

• ${b}_{j}$ is the estimated, scalar bias, which the software stores in `Mdl.Bias(j)`.

## Examples

expand all

Simulate 10000 observations from this model

`$y={x}_{100}+2{x}_{200}+e.$`

• $X={x}_{1},...,{x}_{1000}$ is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

• e is random normal error with mean 0 and standard deviation 0.3.

```rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);```

Cross-validate a linear regression model.

`CVMdl = fitrlinear(X,Y,'CrossVal','on')`
```CVMdl = classreg.learning.partition.RegressionPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 10000 KFold: 10 Partition: [1x1 cvpartition] ResponseTransform: 'none' Properties, Methods ```
`Mdl1 = CVMdl.Trained{1}`
```Mdl1 = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: 0.0107 Lambda: 1.1111e-04 Learner: 'svm' Properties, Methods ```

By default, `fitrlinear` implements 10-fold cross-validation. `CVMdl` is a `RegressionPartitionedLinear` model. It contains the property `Trained`, which is a 10-by-1 cell array holding 10 `RegressionLinear` models that the software trained using the training set.

Predict responses for observations that `fitrlinear` did not use in training the folds.

`yHat = kfoldPredict(CVMdl);`

Because there is one regularization strength in `Mdl`, `yHat` is a numeric vector.

Simulate 10000 observations as in Predict Cross-Validated Responses.

```rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);```

Create a set of 15 logarithmically-spaced regularization strengths from $1{0}^{-5}$ through $1{0}^{-1}$.

`Lambda = logspace(-5,-1,15);`

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Specify using least squares with a lasso penalty and optimizing the objective function using SpaRSA.

```X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso');```

`CVMdl` is a `RegressionPartitionedLinear` model. Its `Trained` property contains a 5-by-1 cell array of trained `RegressionLinear` models, each one holds out a different fold during training. Because `fitrlinear` trained using 15 regularization strengths, you can think of each `RegressionLinear` model as 15 models.

Predict cross-validated responses.

```YHat = kfoldPredict(CVMdl); size(YHat)```
```ans = 1×2 10000 15 ```
`YHat(2,:)`
```ans = 1×15 -1.7338 -1.7332 -1.7319 -1.7299 -1.7266 -1.7239 -1.7135 -1.7210 -1.7324 -1.7063 -1.6397 -1.5112 -1.2631 -0.7841 -0.0096 ```

`YHat` is a 10000-by-15 matrix. `YHat(2,:)` is the cross-validated response for observation 2 using the model regularized with all 15 regularization values.