I figured it out. The read statement is what moves the file pointer, and if it gives an error then the pointer stays put. I solved the problem by making the read function fileparts, obtaining the filename from that, and then using try,catch with extractFileText on that file.
How do I skip a file that gives an error when using fileDatastore to loop through a folder of pdfs?
3 次查看(过去 30 天)
显示 更早的评论
I am mining text from several thousand pdfs in a folder using the Text Analytics Toolbox. I am using fileDatastore to loop through them. Some of the pdfs are encrypted, which gives an error with extractFileText. I have added a try,catch segment to skip those files, but when it catches the error it goes back to try and reads the same file again. The loop never ends. How do I increment the counter so that it will move on past the bad file? Here is part of the code:
fds = fileDatastore('File*.pdf','ReadFcn',@extractFileText);
while hasdata(fds)
% extract and prepare text
try % be prepared for error such as locked pdf
text=read(fds); % this is where error occurs
catch
disp('encrypted pdf');
continue
end
text=erasePunctuation(text);
% etc. (other text-parsing)
...
end
0 个评论
采纳的回答
Allen
2019-1-12
1 个评论
Eniola Oluwakoya
2020-7-28
Hi, could you share more light on how you made the read function fileparts?
更多回答(1 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Startup and Shutdown 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!