Compress Machine Learning Model for Memory-Limited Hardware
This example shows how to reduce the size of a machine learning model for deployment to memory-limited hardware. To demonstrate the model compression workflow, the example builds models for the acoustic scene classification (ASC) task, which classifies environments from the sounds they produce. ASC is a generic multiclass classification problem that is foundational for context awareness in devices, robots, and other applications [1].
Assume that you want to build a model for hearing aids where the available memory size is 30 KB. First, simplify the multiclass ASC task to a binary classification problem, and them perform these steps:
Reduce the number of features by selecting important features.
Optimize hyperparameters with coupled constraints, which limit the size of a machine learning model.
Quantize model parameters.
For more details on optimizing hyperparameters to reduce the memory size, see More About.
Load Data
Load the acousticscenes
data set, and display the variables in the data set.
load("acousticscenes.mat")
whos
Name Size Bytes Class Attributes xEval 300x286 686400 double xTest 300x286 686400 double xTrain 1500x286 3432000 double yEval 300x1 2102 categorical yTest 300x1 2102 categorical yTrain 1500x1 3302 categorical
xTrain
, xEval
, and xTest
contain features extracted from the TUT acoustic scene data set using wavelet scattering. yTrain
, yEval
, and yTest
contain acoustic scene labels of 15 different types for xTrain
, xEval
, and xTest
, respectively. In this example, you use xTrain
and yTrain
to train models and xTest
and yTest
to test the accuracy of the trained models. During the optimization step, you use xEval
and yEval
as a holdout validation set.
The TUT acoustic scene data set provides development data (TUT-acoustic-scenes-2017-development
[3]) and test data (TUT-acoustic-scenes-2017-evaluation
[4]). The development data provides a 4-fold cross-validation setup. xTrain
and xEval
are from the subsets of the training and evaluation sets (respectively) defined by the first fold of the cross-validation setup, and xTest
is from the subset of the test data set. The example Acoustic Scene Recognition Using Late Fusion (Audio Toolbox) describes how you can obtain these variables from a subset of the TUT acoustic scene data set.
Normalize the data sets.
[xTrain,mu,sigma] = normalize(xTrain); xEval = normalize(xEval,center=mu,scale=sigma); xTest = normalize(xTest,center=mu,scale=sigma);
Select Classification Model Types
Select types of classification models for this example by using the Classification Learner app.
On the Apps tab, open the apps gallery. Then, in the Machine Learning and Deep Learning group, click Classification Learner.
On the Classification Learner tab, in the File section, click New Session and select From Workspace. In the dialog box, specify
yTrain
as the response variable, and specify the variables inxTrain
as predictors.In the Models section of the app, click All. This option trains all the model presets available for your data set.
In the Train section, click Train All and select Train All.
You can compare trained models based on accuracy scores, visualize results by plotting class predictions, and check performance using the confusion matrix and ROC curve. For more details on Classification Learner, see Train Classification Models in Classification Learner App.
In this example, you work with these five model types:
Bilayered neural network
Linear discriminant
Random subspace ensemble with discriminant analysis learners
Linear SVM
Logistic regression
Create a variable containing the model names.
MdlNames = ["Bilayered NN","Linear Discriminant", ... "Subspace Discriminant","Linear SVM","Logistic Regression"]';
Train Multiclass Classification Models
Train the five models using fitting functions at the command line, and then reduce the size of the trained models by using the compact
function. The compact
function discards information that is not necessary for prediction.
SVM and logistic regression models support only binary classification. Therefore, use the fitcecoc
function to train a multiclass classification model with linear SVM learners and a multiclass classification model with logistic regression learners. For the logistic regression model, use a templateLinear
learner; in this case, you do not use the compact
function because fitcecoc
returns a compact model object (CompactClassificationECOC
).
rng("default") % For reproducibility multiMdls = cell(5,1); % Bilayered NN multiMdls{1} = compact(fitcnet(xTrain,yTrain,LayerSizes=[10 10])); % Linear Discriminant multiMdls{2} = compact(fitcdiscr(xTrain,yTrain)); % Subspace Discriminant multiMdls{3} = compact(fitcensemble(xTrain,yTrain, ... Method="Subspace",Learners="discriminant", ... NumLearningCycles=30,NPredToSample=25)); % Linear SVM multiMdls{4} = compact(fitcecoc(xTrain,yTrain)); % Logistic Regression tLinear = templateLinear(Learner="logistic"); multiMdls{5} = fitcecoc(xTrain,yTrain,Learners=tLinear);
Specify the output display format as bank
to display two digits after the decimal point.
format("bank")
Test the models with the test data set using the helper function helperMdlMetrics
.This function returns a table of model metrics including the model accuracy as a percentage and the model size in KB. The code for the helperMdlMetrics
function appears at the end of this example.
multiMdlTbl = helperMdlMetrics(multiMdls,xTest,yTest); tbl1 = multiMdlTbl; tbl1.Properties.RowNames = MdlNames; disp(tbl1)
Accuracy Model Size ________ __________ Bilayered NN 54.33 36.17 Linear Discriminant 53.33 2776.71 Subspace Discriminant 50.67 881.54 Linear SVM 34.33 901.90 Logistic Regression 50.00 1937.67
The size of each model is more than 30 KB, and the accuracy value is approximately 50% for most models.
Simplify Problem as Binary Classification
For the hearing aid application, assume you only want to distinguish background sounds and sounds from specific sources, instead of classifying sounds into the 15 types included in the data set. Group the types of sounds into two types (AllAround
and Directional
) by using the mergecats
function.
AllAround = ["beach","forest_path","park","office","home", ... "library","city_center","residential_area"]; Directional = ["train","bus","car","tram","grocery_store", ... "metro_station","cafe/restaurant"]; yTrainMapped = mergecats(yTrain,AllAround,"AllAround"); yTrainMapped = mergecats(yTrainMapped,Directional,"Directional"); yEvalMapped = mergecats(yEval,AllAround,"AllAround"); yEvalMapped = mergecats(yEvalMapped,Directional,"Directional"); yTestMapped = mergecats(yTest,AllAround,"AllAround"); yTestMapped = mergecats(yTestMapped,Directional,"Directional");
Create a grouped scatter plot of the first two principal components to see whether the binary grouping works.
figure [~,score] = pca(xTrain); gscatter(score(:,1),score(:,2),yTrainMapped) xlabel("First principal component") ylabel("Second principal component")
Train Binary Classification Models
Train the models for the binary sound labels yTrainMapped
. For the linear SVM model, reduce the memory size by discarding the support vectors by using the discardSupportVectors
function. The model can still predict new data using the linear predictor coefficients stored in the model property Beta
. For the logistic regression model, the fitclinear
function returns a compact model that does not store the training data.
rng("default") binaryMdls = cell(5,1); % Bilayered NN binaryMdls{1} = compact(fitcnet(xTrain,yTrainMapped,LayerSizes=[10 10])); % Linear Discriminant binaryMdls{2} = compact(fitcdiscr(xTrain,yTrainMapped)); % Subspace Discriminant binaryMdls{3} = compact(fitcensemble(xTrain,yTrainMapped, ... Method="Subspace",Learners="discriminant",NumLearningCycles=30,NPredToSample=25)); % Linear SVM binaryMdls{4} = discardSupportVectors(compact(fitcsvm(xTrain,yTrainMapped))); % Logistic Regression binaryMdls{5} = fitclinear(xTrain,yTrainMapped,Learner="logistic");
Test the binary classification models with the test data set yTestMapped
.
binaryMdlTbl = helperMdlMetrics(binaryMdls,xTest,yTestMapped); tbl2 = table(multiMdlTbl,binaryMdlTbl); tbl2.Properties.RowNames = MdlNames; tbl2.Properties.VariableNames = ["Multiclass","Binary"]; disp(tbl2)
Multiclass Binary Accuracy Model Size Accuracy Model Size ______________________ ______________________ Bilayered NN 54.33 36.17 99.33 31.89 Linear Discriminant 53.33 2776.71 98.00 1314.90 Subspace Discriminant 50.67 881.54 99.33 552.08 Linear SVM 34.33 901.90 97.00 8.74 Logistic Regression 50.00 1937.67 98.67 18.60
The trained models accurately classify the acoustic scenes for the binary classification problem. The linear SVM and logistic regression models are smaller than 30 KB.
Train Models with Fewer Features
You can make machine learning models smaller without losing too much accuracy by building models using only important features. xTrain
, xTest
, and xEval
include 286 features. Select 50 features by using the fscmrmr
function.
idx = fscmrmr(xTrain,yTrainMapped); xTrainSelected = xTrain(:,idx(1:50)); xEvalSelected = xEval(:,idx(1:50)); xTestSelected = xTest(:,idx(1:50));
Train binary classification models using the selected features.
rng("default") feat50binaryMdls = cell(5,1); % Bilayered NN feat50binaryMdls{1} = compact(fitcnet(xTrainSelected,yTrainMapped,LayerSizes=[10 10])); % Linear Discriminant feat50binaryMdls{2} = compact(fitcdiscr(xTrainSelected,yTrainMapped)); % Subspace Discriminant feat50binaryMdls{3} = compact(fitcensemble(xTrainSelected,yTrainMapped, ... Method="Subspace",Learners="discriminant",NumLearningCycles=30,NPredToSample=25)); % Linear SVM feat50binaryMdls{4} = discardSupportVectors(compact(fitcsvm(xTrainSelected,yTrainMapped))); % Logistic Regression feat50binaryMdls{5} = fitclinear(xTrainSelected,yTrainMapped,Learner="logistic");
Test the models with the test data set yTestMapped
.
feat50binaryMdlTbl = helperMdlMetrics(feat50binaryMdls,xTestSelected,yTestMapped); tbl3 = table(multiMdlTbl,binaryMdlTbl,feat50binaryMdlTbl); tbl3.Properties.RowNames = MdlNames; tbl3.Properties.VariableNames = ["Multiclass","Binary","50 Features"]; disp(tbl3)
Multiclass Binary 50 Features Accuracy Model Size Accuracy Model Size Accuracy Model Size ______________________ ______________________ ______________________ Bilayered NN 54.33 36.17 99.33 31.89 90.67 11.38 Linear Discriminant 53.33 2776.71 98.00 1314.90 95.33 51.70 Subspace Discriminant 50.67 881.54 99.33 552.08 91.33 541.91 Linear SVM 34.33 901.90 97.00 8.74 96.33 4.82 Logistic Regression 50.00 1937.67 98.67 18.60 97.00 12.18
In addition to the linear SVM and logistic regression models, the bilayered neural network model is also smaller than 30 KB. However, reducing the number of features causes the accuracy to decrease in the trained models.
Restore the default display format.
format("default")
Optimize Neural Network with Coupled Constraints
Find optimal model hyperparameters while limiting the memory use of the models. The constraints depend on the type of machine learning model. For example, you can limit the number of support vectors for an SVM model or limit the number of parameters in a neural network model. For more details on Bayesian optimization and an example for an SVM model, see Constraints in Bayesian Optimization. This example shows constraint-coupled optimization for a bilayered neural network model.
For constraint-coupled optimization, specify the hyperparameters to optimize and define a customized objective function. Then, use the bayesopt
function to find the optimal hyperparameters based on the objective function.
First, get the default hyperparameters of the bilayered neural network model by using the hyperparameters
function.
params_bilayeredNet = hyperparameters("fitcnet",xTrainSelected,yTrainMapped);
Modify the first, third, and ninth hyperparameters, which correspond to NumLayers
, Standardize
, and Layer_3_Size
, so that they are not optimized. In this way, you can build a bilayered model and use training data without standardization. The training data is already standardized.
params_bilayeredNet(1).Range = [1 2]; % NumLayers params_bilayeredNet(1).Optimize = false; params_bilayeredNet(3).Optimize = false; % Standardize params_bilayeredNet(9).Optimize = false; % Layer_3_Size
Use the customized objective function helperOptimizeConstrainedBilayer
, which trains a bilayered neural network model using a given set of parameters for the training data set, and returns the loss for the holdout validation set. The code for the helperOptimizeConstrainedBilayer
function appears at the end of this example. The function also accepts the upper limit for the number of weight parameters in the model and returns a constraint value. A positive constraint value indicates that the number of parameters is greater than the specified limit.
Define a function handle fun
that takes the hyperparameters and calls the helperOptimizeConstrainedBilayer
function. Specify the upper limit for the number of weight parameters as 300.
fun = @(params)helperOptimizeConstrainedBilayer(params,xTrainSelected,yTrainMapped,xEvalSelected,yEvalMapped,300);
When you call the bayesopt
function, specify the objective function as fun
and specify the hyperparameters as params_bilayeredNet
. Also, specify NumCoupledConstraints
as 1 to indicate that the objective function has one coupled constraint. For reproducibility, set the random seed and use the expected-improvement-plus
acquisition function.
rng("default") resultNN = bayesopt(fun,params_bilayeredNet, ... AcquisitionFunctionName="expected-improvement-plus", ... NumCoupledConstraints=1);
|==================================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | Activations | Lambda | Layer_1_Size | Layer_2_Size | | | result | | runtime | (observed) | (estim.) | | | | | | |==================================================================================================================================================| | 1 | Infeas | 0.076667 | 3.8313 | NaN | 0.076667 | 2.4e+03 | none | 7.6806e-06 | 15 | 115 | | 2 | Best | 0.07 | 1.1425 | 0.07 | 0.070445 | -196 | none | 0.0001221 | 2 | 1 | | 3 | Infeas | 0.46667 | 0.15246 | 0.07 | 0.070862 | 1.39e+03 | sigmoid | 45.438 | 26 | 14 | | 4 | Best | 0.063333 | 1.3051 | 0.063333 | 0.063353 | -52.5 | tanh | 2.6069e-05 | 4 | 8 | | 5 | Accept | 0.11333 | 1.4743 | 0.063333 | 0.063423 | -58.5 | relu | 2.2423e-05 | 4 | 7 | | 6 | Accept | 0.07 | 1.1222 | 0.063333 | 0.063344 | -196 | none | 0.0001411 | 2 | 1 | | 7 | Infeas | 0.046667 | 1.5327 | 0.063333 | 0.06318 | 1.95e+04 | tanh | 1.2269e-07 | 300 | 16 | | 8 | Infeas | 0.11333 | 5.227 | 0.063333 | 0.063575 | 9.47e+04 | tanh | 0.045218 | 298 | 267 | | 9 | Accept | 0.46667 | 0.023516 | 0.063333 | 0.063332 | -196 | none | 9.1357 | 2 | 1 | | 10 | Infeas | 0.46667 | 0.025527 | 0.063333 | 0.063332 | 1.42e+03 | relu | 3.0052 | 30 | 7 | | 11 | Best | 0.046667 | 2.0311 | 0.046667 | 0.046678 | -172 | relu | 6.691e-09 | 2 | 7 | | 12 | Accept | 0.046667 | 1.0284 | 0.046667 | 0.046675 | -52.5 | tanh | 6.7859e-09 | 4 | 8 | | 13 | Accept | 0.086667 | 2.4386 | 0.046667 | 0.046686 | -172 | relu | 1.1251e-07 | 2 | 7 | | 14 | Accept | 0.46667 | 0.024936 | 0.046667 | 0.04668 | -58.5 | tanh | 60.245 | 4 | 7 | | 15 | Best | 0.03 | 1.0594 | 0.03 | 0.030086 | -58.5 | tanh | 0.0011383 | 4 | 7 | | 16 | Infeas | 0.12333 | 0.12629 | 0.03 | 0.03007 | 296 | sigmoid | 6.766e-09 | 10 | 8 | | 17 | Accept | 0.076667 | 0.71763 | 0.03 | 0.030071 | -146 | none | 8.2973e-09 | 3 | 1 | | 18 | Best | 0.023333 | 1.0659 | 0.023333 | 0.026599 | -58.5 | tanh | 0.0009958 | 4 | 7 | | 19 | Accept | 0.026667 | 1.01 | 0.023333 | 0.02661 | -52.5 | tanh | 0.0009402 | 4 | 8 | | 20 | Accept | 0.05 | 1.3193 | 0.023333 | 0.026601 | -226 | sigmoid | 1.086e-05 | 1 | 8 | |==================================================================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | Constraint1 | Activations | Lambda | Layer_1_Size | Layer_2_Size | | | result | | runtime | (observed) | (estim.) | | | | | | |==================================================================================================================================================| | 21 | Accept | 0.036667 | 1.0198 | 0.023333 | 0.027248 | -110 | tanh | 0.00090677 | 3 | 8 | | 22 | Infeas | 0.053333 | 5.9702 | 0.023333 | 0.027181 | 1.41e+04 | tanh | 0.00048938 | 283 | 1 | | 23 | Infeas | 0.12333 | 0.37451 | 0.023333 | 0.027429 | 1.71e+04 | relu | 7.1367e-09 | 238 | 23 | | 24 | Accept | 0.076667 | 0.92349 | 0.023333 | 0.029543 | -248 | none | 6.7138e-07 | 1 | 1 | | 25 | Accept | 0.046667 | 1.3113 | 0.023333 | 0.02962 | -226 | tanh | 1.1434e-07 | 1 | 8 | | 26 | Accept | 0.043333 | 1.3654 | 0.023333 | 0.029659 | -168 | sigmoid | 9.1787e-07 | 2 | 8 | | 27 | Accept | 0.043333 | 0.71783 | 0.023333 | 0.029584 | -226 | tanh | 0.0018534 | 1 | 8 | | 28 | Infeas | 0.06 | 3.8672 | 0.023333 | 0.030036 | 1.31e+04 | sigmoid | 2.3192e-06 | 257 | 2 | | 29 | Accept | 0.066667 | 1.257 | 0.023333 | 0.026647 | -226 | tanh | 0.00050488 | 1 | 8 | | 30 | Accept | 0.036667 | 0.70965 | 0.023333 | 0.028015 | -52.5 | tanh | 0.0044111 | 4 | 8 |
__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 60.5813 seconds Total objective function evaluation time: 44.1746 Best observed feasible point: Activations Lambda Layer_1_Size Layer_2_Size ___________ _________ ____________ ____________ tanh 0.0009958 4 7 Observed objective function value = 0.023333 Estimated objective function value = 0.029092 Function evaluation time = 1.0659 Observed constraint violations =[ -58.500000 ] Best estimated feasible point (according to models): Activations Lambda Layer_1_Size Layer_2_Size ___________ _________ ____________ ____________ tanh 0.0011383 4 7 Estimated objective function value = 0.028015 Estimated function evaluation time = 1.0464 Estimated constraint violations =[ -58.501089 ]
bayesopt
finds optimal hyperparameters that minimize an error in the holdout validation set and satisfy the constraint. Extract the best point in the optimization results resultNN
by using the bestPoint
function.
[optimalParams,CriterionValue1,iteration] = bestPoint(resultNN)
optimalParams=1×4 table
Activations Lambda Layer_1_Size Layer_2_Size
___________ _________ ____________ ____________
tanh 0.0011383 4 7
CriterionValue1 = 0.0332
iteration = 15
Train the bilayered neural network model with the optimal hyperparameters.
rng("default") modelNNOpt = compact(fitcnet(xTrainSelected,yTrainMapped, ... Activations=char(optimalParams.Activations), ... LayerSizes=[optimalParams.Layer_1_Size optimalParams.Layer_2_Size], ... Lambda=optimalParams.Lambda));
Find the accuracy and size of the trained model.
OptimizedNNAccuracy = (1-loss(modelNNOpt,xTestSelected,yTestMapped))*100
OptimizedNNAccuracy = 93.3333
OptimizedNNSize = whos("modelNNOpt").bytes/1024
OptimizedNNSize = 8.3555
Quantize Model Parameters with Simulink Block
You can also reduce the memory footprint of a machine learning model by quantizing model parameters with a Simulink block. Statistics and Machine Learning Toolbox™ provides various prediction blocks that allows you to import a trained machine learning model into a Simulink model. In the prediction blocks, you can specify the data types for some or all model parameters as single-precision, fixed-point, half-precision, and so on. For an example of fixed-point conversion, see Human Activity Recognition Simulink Model for Fixed-Point Deployment.
This example provides the Simulink model slexAcousticSceneClassificationNNPredictExample.slx
, which includes the ClassificationNeuralNetwork Predict block. Open this model.
SimMdlName = 'slexAcousticSceneClassificationNNPredictExample';
open_system(SimMdlName)
Double-click the ClassificationNeuralNetwork Predict block to open the Block Parameters dialog box. You can specify the data types for the model parameters in the Data Types tab. to reduce the memory size, specify the data types for the layers as single
. For details on specifying data types, see Specify Data Types Using Data Type Assistant (Simulink).
Prepare the input data for the Simulink model. Convert the predictor data (xTestSelected
) to single precision by using the single
function.
soundInput.time = (0:size(xTestSelected,1)-1)'; soundInput.signals(1).values = single(xTestSelected); soundInput.signals(1).dimensions = size(xTestSelected,2);
Simulate the Simulink model and assign the result to the out
variable.
out = sim(SimMdlName);
Find the accuracy of the predict block using the data logged in the To Workspace (Simulink) block.
pred = categorical(out.simout.Data,unique(out.simout.Data),["AllAround","Directional"]); QuantizedNNAccuracy = sum(pred == yTestMapped)/length(yTestMapped)*100
QuantizedNNAccuracy = 93.3333
Find the size of the quantized model parameters.
p = Simulink.Mask.get("slexAcousticSceneClassificationNNPredictExample/ClassificationNeuralNetwork Predict"); vars = p.getWorkspaceVariables; blockParams = vars(end).Value; save("params.mat","blockParams") s = dir("params.mat"); QuantizedNNSize = s.bytes/1024
QuantizedNNSize = 2.4951
Model Compression Summary
Display the changes in model size and accuracy during the model compression workflow for the bilayered neural network model. In general, the model loses some accuracy as you apply additional model compression schemes.
NNccuracy = [multiMdlTbl{1,"Accuracy"} binaryMdlTbl{1,"Accuracy"} ... feat50binaryMdlTbl{1,"Accuracy"} ... OptimizedNNAccuracy QuantizedNNAccuracy]; NNSize = [multiMdlTbl{1,"Model Size"} binaryMdlTbl{1,"Model Size"} ... feat50binaryMdlTbl{1,"Model Size"} ... OptimizedNNSize QuantizedNNSize]; ModelType = ["Multiclass","Binary","50 Features","Optimized","Single Precision"]; figure yyaxis left b = bar(NNSize); xtips = b.XEndPoints; ytips = b.YEndPoints; labels = string(round(b.YData,2)); text(xtips,ytips,labels,HorizontalAlignment="center",VerticalAlignment="bottom", ... Color='#0072BD') ylabel("Model Size [KB]") yyaxis right plot(NNccuracy,"-o") ylabel("Accuracy [%]") xticklabels(ModelType) grid on
For the bilayered neural network model, the model size decreases to less than 30 KB after you reduce the number of features. The constrained optimization and converting data to single precision further reduce the model size.
The accuracy of the initial multiclass classification model is lower compared to the other models, because the multiclass model classifies sounds into 15 types. After you simplify the multiclass problem into a binary classification problem, the models accurately classify more than 90% of the test data. Reducing the number of features leads to a loss of model accuracy, but the constrained optimization step improves accuracy, and converting data to single precision does not reduce accuracy.
Helper Functions
The helperMdlMetrics
function takes a cell array of trained models (Mdls
) and test data sets (X
and Y
) and returns a table of model metrics that includes the model accuracy as a percentage and the model size in KB. The helper function uses the whos
function to estimate the model size. However, the size returned by the whos
function can be larger than the actual model size required in the generated code for deployment. For example, the generated code does not include information that is not needed for prediction. Consider a CompactClassificationECOC
model that uses logistic regression learners. The binary learners in a CompactClassificationECOC
model object in the MATLAB workspace contain the ModelParameters
property. However, the model prepared for deployment in the generated code does not contain this property.
function tbl = helperMdlMetrics(Mdls,X,Y) numMdl = length(Mdls); metrics = NaN(numMdl,2); for i = 1 : numMdl Mdl = Mdls{i}; MdlInfo = whos("Mdl"); metrics(i,:) = [(1-loss(Mdl,X,Y))*100 MdlInfo.bytes/1024]; end tbl = array2table(metrics, ... VariableNames=["Accuracy","Model Size"]); end
The helperOptimizeConstrainedBilayer
function trains a bilayered neural network model using a given set of parameters for the training data, and returns the loss for the holdout validation set. In addition, the function accepts the upper limit (maxSize
) for the number of weight parameters in the model and returns a constraint value. A positive constraint value indicates that the number of parameters is greater than the specified limit maxSize
.
function [objective,constraint] = helperOptimizeConstrainedBilayer(params,xTrain,yTrain,xEval,yEval,maxSize) mdl = fitcnet(xTrain,yTrain, ... Activations=char(params.Activations), ... LayerSizes=[params.Layer_1_Size params.Layer_2_Size], ... Lambda=params.Lambda); objective = loss(mdl,xEval,yEval); numClasses = size(unique(yTrain),1); sizeEst = size(xTrain,2)*params.Layer_1_Size + ... params.Layer_1_Size*params.Layer_2_Size + ... params.Layer_2_Size*numClasses; constraint = sizeEst - maxSize - 0.5; end
More About
For constraint-coupled optimization, you can consider minimizing these hyperparameters to limit the memory use, depending on the type of machine learning model:
Decision tree — Minimum number of leaf node observations (MinLeafSize) and the maximum number of decision splits (MaxNumSplits). A decision tree model has a small memory footprint.
Linear discriminant and logistic regression — Number of features and classes. Both a linear discriminant model and a logistic regression model have a small to medium memory footprint.
Shallow neural network — Number of fully connected layers and the number of hidden units in each layer (LayerSizes). A shallow neural network model has a small to medium memory footprint.
k-nearest neighbor — Training data size, the number of nearest neighbors (NumNeighbors), and the maximum number of data points in the leaf node for the Kd-tree algorithm (BucketSize). A k-nearest neighbor model has a medium memory footprint.
Support vector machine (SVM) — Number of support vectors determined by the box constrains (BoxConstraint). An SVM has a medium to large memory footprint. For an SVM model that uses the linear kernel function, you can reduce the footprint by discarding support vectors from the model using the
discardSupportVectors
function. The reduced SVM model can still predict new data using predictor coefficients (Beta
property) stored in the model.Ensemble — Number of learners and the size of each learner determined by NumLearningCycles and Learners. An ensemble has a medium to large memory footprint.
Gaussian process regression (regression only) — Size of the active set (ActiveSetSize). A Gaussian process regression model has a medium to large memory footprint.
Several factors determine the memory use of a machine learning model. However, in general, the memory footprint for a decision tree model is small. A linear discriminant model, logistic regression model, and shallow neural network model have a small to medium memory footprint, and a k-nearest neighbor model has a medium memory footprint. An SVM, ensemble, and Gaussian process model have a medium to large memory footprint. For an SVM model that uses the linear kernel function, you can discard support vectors from the model to reduce the footprint by using the discardSupportVectors
function. The reduced SVM model can still predict new data using predictor coefficients (Beta
property) stored in the model.
For deployment to memory-limited hardware, a recommended practice is to specify training data using a matrix, not a table. If you specify training data using a table, some model properties, such as PredictorNames
, can take a considerable proportion of the model memory footprint.
References
[1] Mesaros, Annamaria, Toni Heittola, and Tuomas Virtanen. Acoustic Scene Classification: An Overview of DCASE 2017 Challenge Entries. In proc. International Workshop on Acoustic Signal Enhancement, 2018.
[2] Lostanlen, Vincent, and Joakim Anden. Binaural Scene Classification with Wavelet Scattering. Technical Report, DCASE2016 Challenge, 2016.
See Also
fscmrmr
| bayesopt
| discardSupportVectors
| ClassificationNeuralNetwork
Predict