using fitcknn in matlab

6 次查看(过去 30 天)
MiauMiau
MiauMiau 2014-12-6
Hi
I want to use fitcknn but with an implemented Distance metric, in my case levenshtein:
mdl = fitcknn(citynames,citycodes,'NumNeighbors', 50, 'exhaustive','Distance',@levenshtein);
This doesn't work, although it says in the Documentation "Distance metric, specified as the comma-separated pair consisting of 'Distance' and a valid distance metric string or function handle."
The error I get:
Error using internal.stats.parseArgs (line 42) Wrong number of arguments.
Error in classreg.learning.generator.Partitioner.processArgs (line 65) [cvpart,crossval,kfold,holdout,leaveout,~,otherArgs] = ...
Error in ClassificationKNN.fit (line 728) Nfold = classreg.learning.generator.Partitioner.processArgs(varargin{:});
Error in fitcknn (line 263) this = ClassificationKNN.fit(X,Y,varargin{:});
Error in NNlevenshtein (line 8) mdl = fitcknn(citynames,citycodes,'NumNeighbors', 50, 'exhaustive','Distance',@levenshtein);

回答(1 个)

Star Strider
Star Strider 2014-12-6
We need to see your code for levenshtein.
According to the documentation, your levenshtein function has to have the form:
function D2 = DISTFUN(ZI,ZJ)
% calculation of distance
...
where
  • ZI is a 1-by-|N| vector containing one row of X or y.
  • ZJ is an M2-by-|N| matrix containing multiple rows of X or y.
  • D2 is an M2-by-|1| vector of distances, and D2(k) is the distance between observations ZI and ZJ(J,:).
  2 个评论
MiauMiau
MiauMiau 2014-12-6
编辑:Star Strider 2014-12-6
oh I see. So I used some code published on Github, see below, where the input are strings, but I think I can first convert my data fo ASCI characters then
function score = levenshtein(s1, s2)
% score = levenshtein(s1, s2)
%
% Calculates the area under the ROC for a given set
% of posterior predictions and labels. Currently limited to two classes.
%
% s1: string
% s2: string
% score: levenshtein distance
%
% Author: Ben Hamner (ben@benhamner.com)
if length(s1) < length(s2)
score = levenshtein(s2, s1);
elseif isempty(s2)
score = length(s1);
else
previous_row = 0:length(s2);
for i=1:length(s1)
current_row = 0*previous_row;
current_row(1) = i;
for j=1:length(s2)
insertions = previous_row(j+1) + 1;
deletions = current_row(j) + 1;
substitutions = previous_row(j) + (s1(i) ~= s2(j));
current_row(j+1) = min([insertions, deletions, substitutions]);
end
previous_row = current_row;
end
score = current_row(end);
end
Star Strider
Star Strider 2014-12-6
编辑:Star Strider 2014-12-6
I had to look up Levenshtein distance. It is designed to measure the number of letter changes in two strings that would convert one string to another. I don’t see any reason for it not to work in a knn classifier.
I had to review the documentation on fitcknn since I’ve not used it in a while. I’ve also never encountered a problem such as yours.
You likely don’t have to specify 'exhaustive' since according to the documentation, the routine will do that by default. If you do specify it, you have to precede it with 'NSMethod'. Its presence in the argument list without that is likely throwing the error.
See if:
mdl = fitcknn(citynames,citycodes, 'NumNeighbors',50, 'NSMethod','exhaustive', 'Distance',@levenshtein);
or
mdl = fitcknn(citynames,citycodes,'NumNeighbors', 50,'Distance',@levenshtein);
(without 'exhaustive') works.

请先登录,再进行评论。

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by