Most frequent word in text

2 次查看(过去 30 天)
Roger Nadal
Roger Nadal 2019-11-27
How to print all word in text that are together and how many time they appear one word per line order from most to least?
  4 个评论
Walter Roberson
Walter Roberson 2019-11-27
编辑:Walter Roberson 2019-11-27
"No" ? So "can't" is not a "word", and "John's" is not a word, and "self-expression" is not a word? If the file happened to contain
John's self-expression can't runh 7 tlick.
then what would the desired output be?

请先登录,再进行评论。

回答(1 个)

Image Analyst
Image Analyst 2019-11-27
Try this:
str = '123 zxy abc def abc def abc last word';
% str = fileread(fileName); % Read in text from disk file.
words = strsplit(str);
uniqueWords = unique(words)
numUniqueWords = length(uniqueWords)
wordCounts = zeros(numUniqueWords, 1);
for k = 1 : numUniqueWords
thisWord = uniqueWords(k);
indexes = ismember(words, thisWord);
wordCounts(k) = sum(indexes);
end
% Show results in command window
wordCounts
Do you have the Text Analytics Toolbox? There are probably functions in that toolbox to get a histogram of words easier than this.
  3 个评论
Walter Roberson
Walter Roberson 2019-11-27
If you have a cell array of character vectors that is the words, then you can use
randperm(number_of_words, number_to_choose_randomly)
to get out a cell array of character vectors that are that many randomly choosen words. After that your task is reduced to one of displaying them, such as
fprintf('%s\n', TheCellArray{:});
Image Analyst
Image Analyst 2019-11-27
Roger, you might find Talk To Transformer fun. It will generate sentences using a neural network. So even though it generates gibberish, it's not just random words. The grammar is right with nouns, adjectives, etc. right, and sentence structure right. For example when I type in "I like to use MATLAB Answers." below is how it completed the paragraph.
"I like to use MATLAB Answers. There's one new way to run a simulation if I have time, and that's to run the Model of a Power Grapher experiment with a mesh that's made of a grid that covers the corresponding coordinates. The reason for this is that the Lattice Proximal layer doesn't cover each coordinate perfectly, meaning that each layer overlaps some areas, which introduces a kind of noise to the output image. My current theory is that the noise causes the software not to converge as well. Unfortunately, I don't have the equipment."
I've seen one professor feed the whole works of Shakespeare into a network and after the first epoch it was just random letters, than after a few hundred more, it was breaking them into words, then sentences. And after even more it was getting grammar right. After more and more epochs the text got more and more reasonable and less gibberish sounding. He thinks if he trained it for weeks, it might produce something that sounded very reasonable.
cat typing.gif

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Annotations 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by