Matrix index is out of range for deletion

Question

oliver 2023-4-10

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1944759-matrix-index-is-out-of-range-for-deletion

评论： Walter Roberson 2023-4-10

采纳的回答： Walter Roberson

IMBD_reviews_smol.csv

在 MATLAB Online 中打开

my project is sentiment analysis I am trying to follow the tutorial "Create Simple Text Model for Classification"

my database is a list of reviews with labelled sentiment (either 'positive' or 'negative)

I am trying to remove any documents containing no words from the bag-of-words model, and remove the corresponding entries in labels

my code is:

filename = "IMBD_reviews_smol.csv"; 
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
 
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
YTrain = dataTrain.sentiment;
YTest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
Ytrain(idx) = []; %produces an error 
Deletion requires an existing variable.
Xtrain = bag.Counts;
mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end

7 个评论
显示 5更早的评论隐藏 5更早的评论

oliver 2023-4-10

在 MATLAB Online 中打开

with the code i recieve the error message "Error using classreg.learning.classif.FullClassificationModel.prepareData

No class names are found in input labels." about line 25 "mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");"

filename = "IMBD_reviews_smol.csv"; 
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
YTrain = dataTrain.sentiment;
YTest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
YTrain = [];
XTrain = bag.Counts;
mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");
documentsTest = preprocessText(textDataTest);
XTest = encode(bag,documentsTest);
YPred = predict(mdl,XTest);
acc = sum(YPred == YTest)/numel(YTest);
str = [
    "i hated this movie."
    "this was really good"
    "sometimes slow movies work out in the way you want and thats how this movie went"];
documentsNew = preprocessText(str);
XNew = encode(bag,documentsNew);
labelsNew = predict(mdl,XNew);
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end

Walter Roberson 2023-4-10

Yes, as I indicated, you are removing all documents from the bag, so your training information becomes empty.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Walter Roberson 2023-4-10

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1944759-matrix-index-is-out-of-range-for-deletion#answer_1213124

移动：Walter Roberson 2023-4-10

在 MATLAB Online 中打开

IMBD_reviews_smol.csv

filename = "IMBD_reviews_smol.csv"; 
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
 
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
Ytrain = dataTrain.sentiment;
Ytest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
whos Ytrain idx
  Name          Size             Bytes  Class          Attributes

  Ytrain      181x1                423  categorical              
  idx           1x181             1448  double                   
Ytrain(idx) = []; %produces an error 
Xtrain = bag.Counts;
whos
  Name                 Size              Bytes  Class                Attributes

  Xtrain               0x0                  24  double               sparse    
  Ytest               20x1                 262  categorical                    
  Ytrain               0x1                 242  categorical                    
  ans                  1x46                 92  char                           
  bag                  1x1                 640  bagOfWords                     
  cmdout               1x33                 66  char                           
  cvp                  1x1                3278  cvpartition                    
  data               201x2              543470  table                          
  dataTest            20x2               66077  table                          
  dataTrain          181x2              478944  table                          
  documents          181x1               43321  tokenizedDocument              
  filename             1x1                 178  string                         
  idx                  1x181              1448  double                         
  textDataTest        20x1               64602  string                         
  textDataTrain      181x1              477308  string                         
mdl = fitcecoc(Xtrain, Ytrain, "Learners", "linear");
Error using classreg.learning.classif.FullClassificationModel.prepareData
No class names are found in input labels.

Error in ClassificationECOC.prepareData (line 128)
                classreg.learning.classif.FullClassificationModel.prepareData(X,Y,varargin{:});

Error in classreg.learning.FitTemplate/fit (line 246)
                    this.PrepareData(X,Y,this.BaseFitObjectArgs{:});

Error in ClassificationECOC.fit (line 119)
            this = fit(temp,X,Y);

Error in fitcecoc (line 357)
    obj = ClassificationECOC.fit(X,Y,varargin{:});
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end

You are removing all of the documents. The bag is left empty.

2 个评论
显示无隐藏无

oliver 2023-4-10

编辑：Walter Roberson 2023-4-10

I am trying to follow this matlab link https://uk.mathworks.com/help/textanalytics/ug/create-simple-text-model-for-classification.html but using my own dataset. can you help with what i need to change?

Walter Roberson 2023-4-10

在 MATLAB Online 中打开

IMBD_reviews_smol.csv

You were calling removeShortWords twice, so all words less than 15 characters were being removed. The remaining "words" all happened to be unique, so removing infrequent words resulted in an empty bag.

filename = "IMBD_reviews_smol.csv";

data = readtable(filename,'TextType','string');

data.sentiment = categorical(data.sentiment);

cvp = cvpartition(data.sentiment,'Holdout',0.1);

dataTrain = data(cvp.training,:);

dataTest = data(cvp.test,:);

textDataTrain = dataTrain.review;

textDataTest = dataTest.review;

Ytrain = dataTrain.sentiment;

Ytest = dataTest.sentiment;

documents = preprocessText(textDataTrain);

bag = bagOfWords(documents);

bag = removeInfrequentWords(bag,2);

[bag,idx] = removeEmptyDocuments(bag);

Ytrain(idx) = [];

Xtrain = bag.Counts;

mdl = fitcecoc(Xtrain, Ytrain, "Learners", "linear");

mdl

mdl =

CompactClassificationECOC ResponseName: 'Y' ClassNames: [negative positive] ScoreTransform: 'none' BinaryLearners: {[1×1 ClassificationLinear]} CodingMatrix: [2×1 double] Properties, Methods

function documents = preprocessText(textData)

documents = tokenizedDocument(textData);

documents = addPartOfSpeechDetails(documents);

documents = removeStopWords(documents);

documents = erasePunctuation(documents);

documents = removeShortWords(documents,2);

documents = removeLongWords(documents,15);

end

请先登录，再进行评论。

Matrix index is out of range for deletion

7 个评论
显示 5更早的评论隐藏 5更早的评论

采纳的回答

2 个评论
显示无隐藏无

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Matrix index is out of range for deletion

7 个评论 显示 5更早的评论隐藏 5更早的评论

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

7 个评论
显示 5更早的评论隐藏 5更早的评论

2 个评论
显示无隐藏无