Hi there,
I'm training some SVM's on moderate-dimensional data (a few thousand observations by [less than or equal to a few hundred] features) using the following core function call:
model = fitclinear(X_subset(train_idx,:),...
'Regularization', 'lasso',...
In order to assess the degree to which performance depends on the features jointly, rather than individually, I add them in one by one and re-run the classifier. The accuracy curve that I get out looks like this: (red and blue are two different experimental replicates)
As you can see, once the number of dimensions passes 100, the solver's accuracy changes dramatically, getting less accurate and also getting more stochastic. I presume this is because on the backend, MATLAB is changing the solver from sparsa to sgd, as implied by the docs but not explicitly stated. For a second set of data that I have, where the overall accuracy is higher (~80%), the effect is still present but not as dramatic.
Is there a way to prevent MATLAB from switching to sgd? I will try passing in the cofficients from the nfeatures=100 model as a warm start to the subsequent models, but even if that fixes my specific problem, this feels like a bug more generally worth reporting.
Thanks.