How do I skip a file that gives an error when using fileDatastore to loop through a folder of pdfs?

3 次查看(过去 30 天)
I am mining text from several thousand pdfs in a folder using the Text Analytics Toolbox. I am using fileDatastore to loop through them. Some of the pdfs are encrypted, which gives an error with extractFileText. I have added a try,catch segment to skip those files, but when it catches the error it goes back to try and reads the same file again. The loop never ends. How do I increment the counter so that it will move on past the bad file? Here is part of the code:
fds = fileDatastore('File*.pdf','ReadFcn',@extractFileText);
while hasdata(fds)
% extract and prepare text
try % be prepared for error such as locked pdf
text=read(fds); % this is where error occurs
catch
disp('encrypted pdf');
continue
end
text=erasePunctuation(text);
% etc. (other text-parsing)
...
end

采纳的回答

Allen
Allen 2019-1-12
I figured it out. The read statement is what moves the file pointer, and if it gives an error then the pointer stays put. I solved the problem by making the read function fileparts, obtaining the filename from that, and then using try,catch with extractFileText on that file.

更多回答(1 个)

Allen
Allen 2020-7-28
fileparts is a Matlab function that parses out the parts of the file name. It is what is executed when read is executed. Here is the code that worked. info contains the filename parts. I test to see if the name is valid with try...catch.
% First make a structure to hold the files.
fds = fileDatastore('GQquads/*GQ*.pdf','ReadFcn',@fileparts);
% Loop through files
while hasdata(fds)
% read the next entry
[~,info]=read(fds);
% test to see if it is a valid file (can you extract filename?)
try
text=extractFileText(info.Filename);
catch
% display name of bad file
info.Filename
continue
end
num = info.Filename;
num = extractBetween(num,'-','.');
num
nums = [nums num];
% (text preparation)
end

类别

Help CenterFile Exchange 中查找有关 Startup and Shutdown 的更多信息

标签

产品


版本

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by