Trying to sort a text document by alphabetical order and count how many times a word appears

12 次查看(过去 30 天)
Hello,
I am trying to take a .txt file, import it, and then count how many times a word appears and then how sort all the words in order.
I want to take the .txt file, make it into a string, chop the string up into individual words, and then put the words into a matrix along with there word count.
Take the following sentance as an example, "The cat is a cat and would like to have a cat."
The output would look like the following:
word:a count:2
word:and count:1
word:cat count:3
word:have count:1
word:like count:1
word:the count:1
word:to count:1
word:would count:1
Here is what I have right now.
fid = fopen('Theodore_Roosevelt_The_Duties_Of_American_Citizenship.txt');
Line = fgetl(fid);
textfile = strings(1,1);
k = 1;
while ischar(Line) textfile(k,1) = Line;
Line = fgetl(fid);
k = k+1;
end
fclose(fid);
%removing all nonAlpha characters from the text file
punc = [".",";",",","(",")","--","-",];
textfile = replace(textfile,punc," ");
textfile=lower(textfile);
%for loop is used to split string 'textFile' into individula words
words = strings(0);
for i = 1:length(textfile)
words = [words;split(textfile(i))];
end
k=convertStringsToChars(words);
h=1;
F=2;
G=1;
sorted=zeros(1);
for j = 1:length(k)
T=strncmp(k(h),k(F),1);
if T==1
%if true, put h word in sorted before word F
sorted=sorted(k(h),G);
G=G+1;
sorted=sorted(k(F),G);
end
h=h+2;
F=F+2;
end
disp(sorted)
This is the error I get when executing the code:
Error using sort
Input argument must be a cell array
of character vectors.
Error in sorted (line 28)
[ignored,index] =
sort([meshsites(:).' sites(:).']);
Error in fgetltest (line 35)
sorted=sorted(k(h),G);
This is for a homework question, but I am lost about this matrix part I want to put it in.
  3 个评论
Andy T
Andy T 2019-11-3
编辑:Andy T 2019-11-3
the matlab is 2019, I also forgot to put those in there when I was putting the code into the question. Let me fix that. I also forgot to even make a vector called sorted.

请先登录,再进行评论。

采纳的回答

Adam Danz
Adam Danz 2019-11-3
编辑:Adam Danz 2019-11-3
Here's a different approach. See inline comments for details.
% Read text
C = fileread('Theodore_Roosevelt_The_Duties_Of_American_Citizenship.txt');
% Split into words by spaces
words = strsplit(strtrim(C));
% Remove problematic characters
% But be careful: this removes any non-letter from each word.
% cat's turns into cats; But without this "cats" with quotes
% or cats! will not be recognized. If that's a problem you'll
% need to use a regular expression approach.
words = cellfun(@(x)x(isletter(x)), words, 'UniformOutput', false);
% make all letters lower case
words = lower(words);
% sort them into alphabetical order
words = sort(words);
% Count frequency of each word
wordList = unique(words);
wordCount = histcounts(categorical(words), categorical(wordList));
% Output table
T = table(wordList(:), wordCount(:), 'VariableNames', {'Word', 'Count'});
  6 个评论

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Data Type Identification 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by