Word Count in a PDF file

2 次查看(过去 30 天)
I have a PDF file "EHP.pdf", I want to count the total number of words in that file? This file has many sections I want to exclude the last section from the calculations. Any suggestions?

采纳的回答

Omer Yasin Birey
Omer Yasin Birey 2018-12-20
编辑:Omer Yasin Birey 2018-12-21
Hi Ahmed, you can use extractFileText. You must choose a starter word and a finisher word, this word must be unique. Because, counting will end when Matlab encounters this word. By this way you can count the words between the starter and finisher.
str = extractFileText("EHP.pdf");
i = strfind(str,"firstWord"); % write here the first word of your pdf
ii = strfind(str,"lastWord"); % write here the last word of your pdf, that must be distinctive
start = i(1);
fin = ii(1);
extracted = extractBetween(str,start,fin-1)
uniqueWordNumbers = wordCloudCounts(extracted);
counter = uniqueWordNumbers(:,2);
counterArray = table2array(counter);
totalWords = sum(counterArray);
  3 个评论
Omer Yasin Birey
Omer Yasin Birey 2018-12-20
Ah, You are right Ahmed. I made a typo and also forgot a line there, try this instead:
counter = uniqueWordNumbers(:,2);
counterArray = table2array(counter);
totalWords = sum(counterArray);
add this table2array line and change the input of sum with this
Ahmed Alsaadi
Ahmed Alsaadi 2018-12-20
It works now, thank you very much Omer.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Display and Presentation 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by