Create Dummy Variables for Categorical Predictors and Generate C/C++ Code
This example shows how to generate code for classifying data using a support vector machine (SVM) model. Train the model using numeric and encoded categorical predictors. Use dummyvar to convert categorical predictors to numeric dummy variables before fitting an SVM classifier. When passing new data to your trained model, you must preprocess the data in a similar manner.
Alternatively, if a trained model identifies categorical predictors in the CategoricalPredictors property, then you do not need to create dummy variables manually to generate code. The software handles categorical predictors automatically. For an example, see Generate Code to Classify Data in Table.
Preprocess Data and Train SVM Classifier
Load the patients data set. Create a table using the Diastolic and Systolic numeric variables. Each row of the table corresponds to a different patient.
load patients
tbl = table(Diastolic,Systolic);
head(tbl) Diastolic Systolic
_________ ________
93 124
77 109
83 125
75 117
80 122
70 121
88 130
82 115
Convert the Gender variable to a categorical variable. The order of the categories in categoricalGender is important because it determines the order of the columns in the predictor data. Use dummyvar to convert the categorical variable to a matrix of zeros and ones, where a 1 value in the (i,j)th entry indicates that the ith patient belongs to the jth category.
categoricalGender = categorical(Gender); orderGender = categories(categoricalGender)
orderGender = 2×1 cell
{'Female'}
{'Male' }
dummyGender = dummyvar(categoricalGender);
Note: The resulting dummyGender matrix is rank deficient. Depending on the type of model you train, this rank deficiency can be problematic. For example, when training linear models, remove the first column of the dummy variables.
Create a table that contains the dummy variable dummyGender with the corresponding variable headings. Combine this new table with tbl.
tblGender = array2table(dummyGender,'VariableNames',orderGender);
tbl = [tbl tblGender];
head(tbl) Diastolic Systolic Female Male
_________ ________ ______ ____
93 124 0 1
77 109 0 1
83 125 1 0
75 117 1 0
80 122 1 0
70 121 1 0
88 130 1 0
82 115 0 1
Convert the SelfAssessedHealthStatus variable to a categorical variable. Note the order of the categories in categoricalHealth, and convert the variable to a numeric matrix using dummyvar.
categoricalHealth = categorical(SelfAssessedHealthStatus); orderHealth = categories(categoricalHealth)
orderHealth = 4×1 cell
{'Excellent'}
{'Fair' }
{'Good' }
{'Poor' }
dummyHealth = dummyvar(categoricalHealth);
Create a table that contains dummyHealth with the corresponding variable headings. Combine this new table with tbl.
tblHealth = array2table(dummyHealth,'VariableNames',orderHealth);
tbl = [tbl tblHealth];
head(tbl) Diastolic Systolic Female Male Excellent Fair Good Poor
_________ ________ ______ ____ _________ ____ ____ ____
93 124 0 1 1 0 0 0
77 109 0 1 0 1 0 0
83 125 1 0 0 0 1 0
75 117 1 0 0 1 0 0
80 122 1 0 0 0 1 0
70 121 1 0 0 0 1 0
88 130 1 0 0 0 1 0
82 115 0 1 0 0 1 0
The third row of tbl, for example, corresponds to a patient with these characteristics: diastolic blood pressure of 83, systolic blood pressure of 125, female, and good self-assessed health status.
Because all the values in tbl are numeric, you can convert the table to a matrix X.
X = table2array(tbl);
Train an SVM classifier using X and a Gaussian kernel function with an automatic kernel scale. Specify the Smoker variable as the response.
Y = Smoker; Mdl = fitcsvm(X,Y, ... 'KernelFunction','gaussian','KernelScale','auto');
Generate C/C++ Code
Generate code that loads the SVM classifier, takes new predictor data as an input argument, and then classifies the new data.
Save the SVM classifier to a file using saveLearnerForCoder.
saveLearnerForCoder(Mdl,'SVMClassifier')saveLearnerForCoder saves the classifier to the MATLAB® binary file SVMClassifier.mat as a structure array in the current folder.
Define the entry-point function mySVMPredict, which takes new predictor data as an input argument. Within the function, load the SVM classifier by using loadLearnerForCoder, and then pass the loaded classifier to predict.
function label = mySVMPredict(X) %#codegen Mdl = loadLearnerForCoder('SVMClassifier'); label = predict(Mdl,X); end
Generate code for mySVMPredict by using codegen. Specify the data type and dimensions of the new predictor data by using coder.typeof so that the generated code accepts a variable-size array.
codegen mySVMPredict -args {coder.typeof(X,[Inf 8],[1 0])}
Code generation successful.
Verify that mySVMPredict and the MEX file return the same results for the training data.
label = predict(Mdl,X); mylabel = mySVMPredict(X); mylabel_mex = mySVMPredict_mex(X); verifyMEX = isequal(label,mylabel,mylabel_mex)
verifyMEX = logical
1
Predict Labels for New Data
To predict labels for new data, you must first preprocess the new data. If you run the generated code in the MATLAB environment, you can follow the preprocessing steps described in this section. If you deploy the generated code outside the MATLAB environment, the preprocessing steps can differ. In either case, you must ensure that the new data has the same columns as the training data X.
In this example, take the third, fourth, and fifth patients in the patients data set. Preprocess the data for these patients so that the resulting numeric matrix matches the form of the training data.
Convert the categorical variables to dummy variables. Because the new observations might not include values from all categories, you need to specify the same categories as the ones used during training and maintain the same category order. In MATLAB, pass the ordered cell array of category names associated with the corresponding training data variable (in this example, orderGender for gender values and orderHealth for self-assessed health status values).
newcategoricalGender = categorical(Gender(3:5),orderGender); newdummyGender = dummyvar(newcategoricalGender); newcategoricalHealth = categorical(SelfAssessedHealthStatus(3:5),orderHealth); newdummyHealth = dummyvar(newcategoricalHealth);
Combine all the new data into a numeric matrix.
newX = [Diastolic(3:5) Systolic(3:5) newdummyGender newdummyHealth]
newX = 3×8
83 125 1 0 0 0 1 0
75 117 1 0 0 1 0 0
80 122 1 0 0 0 1 0
Note that newX corresponds exactly to the third, fourth, and fifth rows of the matrix X.
Verify that mySVMPredict and the MEX file return the same results for the new data.
newlabel = predict(Mdl,newX); newmylabel = mySVMPredict(newX); newmylabel_mex = mySVMPredict_mex(newX); newverifyMEX = isequal(newlabel,newmylabel,newmylabel_mex)
newverifyMEX = logical
1
See Also
dummyvar | categorical | ClassificationSVM | codegen (MATLAB Coder) | coder.typeof (MATLAB Coder) | loadLearnerForCoder | coder.Constant (MATLAB Coder) | saveLearnerForCoder