Chapter 4
Applying MATLAB Functions to Check for Overfitting
This chapter covers the following functions used in regularization and cross-validation:
lassoglm
cvpartition
crossval
Regularization
Regularization techniques are used to prevent statistical overfitting in a predictive model. By introducing additional information into the model, regularization algorithms can handle multicollinearity and redundant predictors by making the model more parsimonious and accurate.
Syntax
B = lasso(X,y)
These algorithms typically work by applying a penalty for complexity, such as adding the coefficients of the model into the minimization or including a roughness penalty.
Regularization for logistic regression can be performed simply in Statistics and Machine Learning Toolbox™ by using the lassoglm
function. lassoglm
is a model-fitting function that updates the weight and bias values according to coordinate descent optimization. It minimizes a combination of squared errors and parameters and then determines the correct combination to produce a model that generalizes well.
Additional classification models in Statistics and Machine Learning Toolbox provide optional regularization arguments that can be used to regularize a model during the training process:
fitclinear
(binary linear classifier)- “Lambda”: regularization term strength
- “Regularization”: lasso (L1), ridge (L2)
fitckernel
(binary Gaussian kernel classifier)- “Regularization”: ridge (L2)
MATLAB Cross-Validation Functions
Statistics and Machine Learning Toolbox has two functions that are particularly useful when performing cross-validation: cvpartition
and crossval
.