Support Vector Machine (SVM)

What Is a Support Vector Machine?

A support vector machine (SVM) is a supervised machine learning algorithm that finds the hyperplane that best separates data points of one class from those of another class.

How Support Vector Machines Work

A support vector machine is a supervised machine learning algorithm often used for classification and regression problems in applications such as signal processing, natural language processing (NLP), and speech and image recognition. The objective of the SVM algorithm is to find a hyperplane that, to the best degree possible, separates data points of one class from those of another class. This hyperplane can be a line for 2D space or a plane for an n-dimensional space, where n is the number of features for each observation in the data set. There can be multiple hyperplanes that separate classes in the data. The optimal hyperplane, derived by the SVM algorithm, is the one that maximizes the margin between the two classes.

The margin is the maximal width of the slab parallel to the hyperplane that has no interior data points. The data points that mark the boundary of this parallel slab and are closest to the separating hyperplane are the support vectors. Support vectors refer to a subset of the training observations that identify the location of the separating hyperplane.

Illustration of the hyperplane identified by a support vector machine, shown with support vectors, margin, and data points separated into two classes.

Using an SVM algorithm to maximize the width of the margin between two classes, represented by plus and minus and separated by a hyperplane.

Workflow for SVM Modeling

A typical workflow for building a support vector machine model includes the following steps:

Preprocess the Data

Training an SVM model with raw data often yields poor results because of missing values and outliers as well as information redundancy. You can perform data cleaning to handle any missing values or outliers and feature extraction to choose the right set of features from the data.

Feature extraction transforms raw data into numerical features that can be processed while preserving the information in the original data set. Dimensionality reduction techniques, such as principal component analysis (PCA), reduce the number of features while retaining the most important information. This approach makes support vector machine models robust and capable of handling complex data sets.

After your data is processed, split the data into training and test sets. Use cross-validation to optimally split the data. The SVM model is trained on the training data set, and the testing data set is used to evaluate the model’s performance on unseen data.

Choose a Kernel

Based on data distribution, choose an appropriate kernel function (linear, polynomial, RBF, etc.). When a linear separation is not possible, a kernel function transforms the data to a higher dimensional space, making it easier to separate classes.

Choosing the Right Kernel for Your SVM
Type of SVM Mercer Kernel Description
Gaussian or radial basis function (RBF) \( K({x_1},{x_2})=exp(−‖{x_1}−{x_2}‖22{σ^2}) \) One-class learning. \( σ \) is the width of the kernel.
Linear \( K({x_1},{x_2})={x_1}T{x_2} \) Two-class learning.
Polynomial \( K({x_1},{x_2})=({x_1}T{x_2}+1)ρ \) \( ρ \) is the order of the polynomial.
Sigmoid \( K({x_1},{x_2})=tanh({β_0}{x_1}T{x_2}+{β_1}) \) It is a Mercer kernel for certain \( β_0 \) and \( β_1 \) values only.

Train the SVM Model

Build and train your SVM model using a training data set. Training a support vector machine corresponds to solving a quadratic optimization problem to fit a hyperplane that maximizes the margin between the classes. The support vector machine algorithm identifies the support vectors and determines the optimal hyperplane. Use the trained model to classify new and unseen data points based on the optimal hyperplane.

Evaluate the SVM Model

Test the model using the test data set. Evaluate model performance using metrics such as accuracy, confusion matrix, precision, F1 score, or recall.

Tune the Hyperparameters

Adjust hyperparameters to improve model performance. Use a search technique such as Bayesian optimization that uses probabilistic models to find the optimal hyperparameters.

Types of SVM Classifiers

Linear Support Vector Machines

Linear SVMs are used for linearly separable data having exactly two classes. This type of support vector machine algorithm uses a linear decision boundary to separate all the data points of the two classes.

The SVM algorithm can find such a hyperplane only for linearly separable problems. For most complex tasks, where the data is nonseparable, the support vector machine can use a soft margin, meaning a hyperplane that separates many, but not all, data points. The algorithm maximizes the soft margin allowing a small number of misclassifications.

MATLAB plot illustrating a support vector machine with a soft margin allowing one misclassification to separate classes.

Soft margin SVM allowing the misclassification of one data point from class -1 (blue). (See MATLAB documentation.)

Nonlinear Support Vector Machines

SVMs are also used for nonlinear classification and regression tasks. For nonlinearly separable data, nonlinear support vector machines use kernel functions to transform the features. The number of transformed features is determined by the number of support vectors.

Kernel functions map the data to a different, often higher-dimensional space. This transformation can make the classes easier to separate by simplifying the complex nonlinear decision boundary to a linear boundary in the higher-dimensional, mapped feature space. In this process, commonly known as the kernel trick, the data does not have to be explicitly transformed, which would be computationally expensive. The kernel functions for nonlinear data include polynomials, radial bias function (Gaussian), and multilayer perceptron or sigmoid (neural network).

MATLAB scatter plot illustrating a support vector machine with hyperplane separating data points into two classes.

A nonlinear SVM classifier trained in MATLAB with a Gaussian kernel function. (See code.)

Support Vector Regression

SVMs are primarily used for classification tasks, but they can also be adapted for regression. SVM regression is considered a nonparametric technique because it relies on kernel functions. Unlike in linear regression, where the relationship between the response (output) and predictor (input) variables is known, the goal of support vector regression (SVR) is to find this relationship. SVR does this by identifying a hyperplane that best fits the data within a specified margin of tolerance, known as margin, while keeping the prediction error to a minimum.

The working principle of SVR is the same as that of support vector machine classifiers, except that SVR aims to predict continuous values instead of discrete classes. SVR can handle both linear and nonlinear data by using the different kernel types. Using the kernel trick, you can perform nonlinear regression by mapping data to a high-dimensional space.

Why Support Vector Machines Are Important

SVM Advantages

SVMs are among the most popular supervised learning algorithms in machine learning and artificial intelligence, mainly because they can handle high-dimensional data and complex decision boundaries effectively. The main advantages of support vector machines are:

  • Data versatility. SVMs are particularly effective when dealing with high-dimensional and unstructured data sets. Support vector machines can be used for both linear and nonlinear data, making them useful for many applications.
  • Robustness. SVMs are less prone to overfitting, especially in high-dimensional spaces, due to the regularization parameter that controls the trade-off between achieving a low error on training data and minimizing the norm of the weights.
  • Interpretability. The decision boundary created by a linear SVM classifier is clear and interpretable, which can be beneficial for understanding model predictions and making informed decisions.
  • Accuracy. SVMs are highly accurate and effective for smaller data sets, especially in cases where the number of dimensions exceeds the number of samples.

SVM Applications

The aforementioned advantages make support vector machines an attractive choice for modeling data in a wide range of applications:

  • Natural language processing. SVMs are widely used in NLP tasks such as spam detection and sentiment analysis by classifying text into categories.
  • Computer vision. SVMs are used in image classification tasks, such as handwriting recognition and face or object detection, and as a medical diagnostic tool for classifying MRI images that might indicate the presence of a tumor.
  • Signal processing. SVMs are also applied on signal data for tasks such as anomaly detection, speech recognition, and biomedical signal analysis.
  • Anomaly detection. SVMs can be trained to find a hyperplane that separates normal data from anomalies.
  • Bioinformatics. SVMs are applied in biological data classification, for protein classification and gene expression profile analysis.
Hyperspectral RGB image with ground truth map and SVM classification map.

A hyperspectral image classified using SVM classifier. (See MATLAB code.)

SVM Disadvantages

Like all machine learning models, support vector machines also have limitations:

  • Large data sets. SVMs are not suitable for handling large data sets because of their high computational cost and memory requirements. While the kernel trick enables support vector machine to handle nonlinear data, it makes them computationally expensive. SVMs require solving a quadratic optimization problem. For large data sets, the kernel matrix would also be large and increase the memory requirements.
  • Noisy data. SVMs are not suitable for noisy data because they aim to maximize the margin between the classes.
  • Interpretability. While linear SVMs are interpretable, nonlinear SVMs are not. The complex transformations involved in nonlinear SVMs render the decision boundary difficult to interpret.
Graph showing machine learning algorithms plotted with interpretability on the x-axis versus their predictive power on the y-axis. SVMs have low interpretability but high predictive power.

Comparison of predictive power and interpretability for popular machine learning algorithms.

Support Vector Machines with MATLAB

With MATLAB® and Statistics and Machine Learning Toolbox™, you can train, evaluate, and make predictions with SVM models for classification and regression. From feature selection and hyperparameter tuning to cross-validation and performance metrics, MATLAB provides you with tools for building efficient support vector machine models. Low-code machine learning apps in MATLAB enable you to train and evaluate SVMs interactively, generate C/C++ code, and deploy to CPUs and microcontrollers, all without writing code yourself.

Preprocessing Data

To ensure accurate results, data must be free from outliers and ready for model training. With MATLAB, you can perform cleaning tasks, such as handling missing values and outliers, normalizing data, and smoothing data. You can use the data preprocessing Live Editor tasks or Data Cleaner app to preprocess your data interactively. These apps also generate code.

MATLAB supports various data types, such as time-series data, text, images, and audio. Specialized toolboxes, such as Audio Toolbox™ and Signal Processing Toolbox™, provide feature extraction capabilities, enabling you to measure distinctive features in different domains and reuse intermediate computations.

Training the SVM Model

You can train your SVM models for binary or multiclass classification and regression tasks using the fitcsvm and fitrsvm functions. For nonlinear support vector machines, several kernel functions (e.g., linear, polynomial, and RBF) are supported, or you can create and specify a custom kernel function (e.g., sigmoid).

You can also train SVM models interactively using the Classification Learner app and Regression Learner app. With these apps, you can perform the complete workflow for an SVM model from training to tuning without needing to write code. The apps let you explore data, select features, perform automated training, optimize hyperparameters, and assess results.

Screenshot showing validation accuracy of several SVM models with validation confusion matrix of ionosphere data modeled by a linear SVM model.

Validation confusion matrix created using the Classification Learner app. (See MATLAB code.)

The apps can generate C/C++ code and let you export your SVM model, making it easy for you to share your results and further investigate them outside of the app. For example, you can export your support vector machine model from the Classification Learner app or the Regression Learner app and import it into the Experiment Manager app to perform additional tasks, such as changing the training data, adjusting hyperparameter search ranges, and running custom training experiments.

Making Predictions

After training the SVM model, predict labels using the predict function. You can simulate your trained SVM model in Simulink with the ClassificationSVM Predict or RegressionSVM Predict blocks.

Evaluating Results

You can evaluate the SVM model’s performance programmatically, using functions such as confusionchart and rocmetrics, or interactively. Using the machine learning apps, you can compare the performance of different models to find the model that best fits your data.

After training classifiers in the Classification Learner app, you can compare models based on accuracy, visualize classifier results by plotting class predictions, and check performance using a confusion matrix, ROC curve, or precision-recall curve.

Similarly, in the Regression Learner app, you can compare models based on model metrics, visualize regression results in a response plot or by plotting the actual versus predicted response, and evaluate models using a residual plot.

Plot showing predicted response of the support vector machine as a line with true response points scattered around the line.

Plot of the predicted response versus the actual response for a regression model trained in the Regression Learner app. The closer the points are to the line, the better the prediction. (See MATLAB code.)