returns coefficient estimates for ridge regression
models of the predictor data
B = ridge(
X and the response
y. Each column of
B corresponds to a
particular ridge parameter
k. By default, the function computes
B after centering and scaling the predictors to have mean 0
and standard deviation 1. Because the model does not include a constant term, do not
add a column of 1s to
specifies the scaling for the coefficient estimates in
B = ridge(
ridge does not restore the coefficients to the original
data scale. When
ridge restores the coefficients to the scale of the
original data. For more information, see Coefficient Scaling.
Perform ridge regression for a range of ridge parameters and observe how the coefficient estimates change.
acetylene data set.
acetylene contains observations for the predictor variables
x3, and the response variable
Plot the predictor variables against each other. Observe any correlation between the variables.
plotmatrix([x1 x2 x3])
For example, note the linear correlation between
Compute coefficient estimates for a multilinear model with interaction terms, for a range of ridge parameters. Use
x2fx to create interaction terms and
ridge to perform ridge regression.
X = [x1 x2 x3]; D = x2fx(X,'interaction'); D(:,1) = ; % No constant term k = 0:1e-5:5e-3; B = ridge(y,D,k);
Plot the ridge trace.
figure plot(k,B,'LineWidth',2) ylim([-100 100]) grid on xlabel('Ridge Parameter') ylabel('Standardized Coefficient') title('Ridge Trace') legend('x1','x2','x3','x1x2','x1x3','x2x3')
The estimates stabilize to the right of the plot. Note that the coefficient of the
x2x3 interaction term changes sign at a value of the ridge parameter .
Predict Values Using Ridge Regression
Predict miles per gallon (MPG) values using ridge regression.
carbig data set.
load carbig X = [Acceleration Weight Displacement Horsepower]; y = MPG;
Split the data into training and test sets.
n = length(y); rng('default') % For reproducibility c = cvpartition(n,'HoldOut',0.3); idxTrain = training(c,1); idxTest = ~idxTrain;
Find the coefficients of a ridge regression model (with k = 5).
k = 5; b = ridge(y(idxTrain),X(idxTrain,:),k,0);
MPG values for the test data using the model.
yhat = b(1) + X(idxTest,:)*b(2:end);
Compare the predicted values to the actual miles per gallon (MPG) values using a reference line.
scatter(y(idxTest),yhat) hold on plot(y(idxTest),y(idxTest)) xlabel('Actual MPG') ylabel('Predicted MPG') hold off
y — Response data
Response data, specified as an n-by-1 numeric vector, where n is the number of observations.
X — Predictor data
Predictor data, specified as an
n-by-p numeric matrix. The rows of
X correspond to the n
observations, and the columns of
X correspond to the
k — Ridge parameters
Ridge parameters, specified as a numeric vector.
[0.2 0.3 0.4 0.5]
scaled — Scaling flag
1 (default) |
Scaling flag that determines whether the coefficient estimates in
B are restored to the scale of the original data,
specified as either
ridge performs this additional transformation. In
B contains p+1 coefficients for each value of
with the first row of
B corresponding to a constant
term in the model. If
then the software omits the additional transformation, and
B contains p coefficients
without a constant term coefficient.
B — Coefficient estimates
Ridge regression is a method for estimating coefficients of linear models that include linearly correlated predictors.
Coefficient estimates for multiple linear regression models rely on the independence of the model terms. When terms are correlated and the columns of the design matrix X have an approximate linear dependence, the matrix (XTX)–1 is close to singular. Therefore, the least-squares estimate
is highly sensitive to random errors in the observed response y, producing a large variance. This situation of multicollinearity can arise, for example, when you collect data without an experimental design.
Ridge regression addresses the problem of multicollinearity by estimating regression coefficients using
where k is the ridge parameter and I is the identity matrix. Small, positive values of k improve the conditioning of the problem and reduce the variance of the estimates. While biased, the reduced variance of ridge estimates often results in a smaller mean squared error when compared to least-squares estimates.
For a given value of λ, a nonnegative
ridge solves the problem
N is the number of observations.
yi is the response at observation i.
xi is the data, a vector of length p at observation i.
λ is a nonnegative regularization parameter corresponding to one value of
The parameter β0 is a scalar, and the parameter β is a vector of length p.
The lasso problem represents the L2 regularization element of Elastic Net.
The scaling of the coefficient estimates for the ridge regression
models depends on the value of the
Suppose the ridge parameter k is equal to 0. The coefficients
scaled is equal to
1, are estimates of the
in the multilinear model
y – μy = b11z1 + ... + bp1zp + ε
where zi = (xi – μi)/σi are the centered and scaled predictors, y – μy is the centered response, and ε is an error term. You can rewrite the model as
y = b00 + b10x1 + ... + bp0xp + ε
with and . The
terms correspond to the coefficients returned by
scaled is equal to
More generally, for any value of
m = mean(X); s = std(X,0,1)'; B1_scaled = B1./s; B0 = [mean(y)-m*B1_scaled; B1_scaled]
B0 = ridge(y,X,k,0).
yas missing values.
ridgeomits observations with missing values from the ridge regression fit.
In general, set
1to produce plots where the coefficients are displayed on the same scale. See Ridge Regression for an example using a ridge trace plot, where the regression coefficients are displayed as a function of the ridge parameter. When making predictions, set
0. For an example, see Predict Values Using Ridge Regression.
Ridge, lasso, and elastic net regularization are all methods for estimating the coefficients of a linear model while penalizing large coefficients. The type of penalty depends on the method (see More About for more details). To perform lasso or elastic net regularization, use
If you have high-dimensional full or sparse predictor data, you can use
ridge. When using
fitrlinear, specify the
'Regularization','ridge'name-value pair argument. Set the value of the
'Lambda'name-value pair argument to a vector of the ridge parameters of your choice.
fitrlinearreturns a trained linear model
Mdl. You can access the coefficient estimates stored in the
Betaproperty of the model by using
 Hoerl, A. E., and R. W. Kennard. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics. Vol. 12, No. 1, 1970, pp. 55–67.
 Hoerl, A. E., and R. W. Kennard. “Ridge Regression: Applications to Nonorthogonal Problems.” Technometrics. Vol. 12, No. 1, 1970, pp. 69–82.
 Marquardt, D. W. “Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation.” Technometrics. Vol. 12, No. 3, 1970, pp. 591–612.
 Marquardt, D. W., and R. D. Snee. “Ridge Regression in Practice.” The American Statistician. Vol. 29, No. 1, 1975, pp. 3–20.
Introduced before R2006a