Reading a *.txt document and extracting specific words/phrases

1 次查看(过去 30 天)
I have a *.txt document file and I would like to extract the words/phrases that I know the start and end character number of them in that document.
For example the word's start and end char number is : 711,724. I tried to match them using the following MATLAB code:
filetoread ='document file path';
fid = fopen(filetoread)
x=zeros(1,1);
while 1
tline = fgetl(fid);
if ~ischar(tline), break, end
x = [x , tline];
end
x(1, 711:724)
In the code I try to save the whole document in a matrix x and printing the columns between 711 and 724. But it does not match the words correctly. I think the problem is with whitespaces,empty lines,...
(I attached a sample document)
I would appreciate any help,
Many thanks

回答(1 个)

Azzi Abdelmalek
Azzi Abdelmalek 2016-3-18
filetoread ='yourfile.txt';
fid = fopen(filetoread)
k=1;
v=cell(1,1)
while 1
tline = fgetl(fid);
if ~ischar(tline), break, end
v{k,1}=tline
k=k+1
end
a=cellfun(@(x) strtrim(x),v,'un',0)
a(cellfun(@isempty,a) )=[]
out=cellfun(@(x) x(10:20),a,'un',0)
  1 个评论
Shima Asaadi
Shima Asaadi 2016-3-18
Thank you very much for answer.
In this case each paragraph is considered separately, though considering empty lines. for example the word with start/end char numbers of "570,590" in the original document can not be extracted in this way. Because it is in a paragraph that starts from first to the length of the paragraph. How can I modify the code to take the whole documents at once?
Thank you for your help

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Get Started with MATLAB 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by