How to extract part of a text file in MATLAB?

5 次查看(过去 30 天)
Okay so I have opened an xml file and want to get the relevant text stored in those files. I tried the following code (noting that the relevant text started after a certain string of characters in the xml file, I tried to use an if statement to extract the text from that point till they reached another point. This would give me less meaningless text so that I could get the text that I want.)
if true
File1 = fopen('Factual1.xml','r');
File2 = fopen('Factual2.xml','r');
File3 = fopen('Colloquial1.xml','r');
File4 = fopen('Colloquial2.xml','r');
File5 = fopen('Hello.xml','r');
File6 = fopen('Hello2.xml','r');
Filenames = {'File1';'File2';'File3';'File4';'File5';'File6'};
B = {0};
for i=File1:File6
A = fscanf(i,'%s');
if ~(strcmp(A,'<w:pw:rsidR="00E3286E"w:rsidRDefault="'))
while((B = fscanf(i,'%c')) ~='\')
B
end
end
end
end
but I keep getting an error, saying that the statement B = fscanf(I,'%c') is not valid. Is there any other way that I can scan the contents of each file, character by character, so that I can extract the amount of text that I want?

回答(2 个)

Ken Atwell
Ken Atwell 2013-6-3
I'm guessing you're a C programmer. You can't assign B in the while loop's conditional like you are attempting to do. Use two lines:
B = fscanf(i, '%c');
while B ~= '\'
...
B = fscanf(i, '%c');
end
BTW, I believe your for loop is working "accidentally" because MATLAB tends to assign file handles in numeric order -- but is perhaps not guaranteed.
  4 个评论
Samyukta Ramnath
Samyukta Ramnath 2013-6-4
But I checked, and they always were consecutive integers! They just didn't always start from one. But will do this anyway, to be sure.
Walter Roberson
Walter Roberson 2013-6-4
MATLAB appears to follow what POSIX does, which is to allocate the first available (lowest numbered) file descriptor. But that does not mean that the results will always be consecutive.
fid1 = fopen('file1');
fid2 = fopen('file2');
fid3 = fopen('file3');
fclose(fid1);
fclose(fid2);
nfid1 = fopen('nfile1');
nfid2 = fopen('nfile2');
nfid3 = fopen('nfile3');
If we assume nothing had been opened before, fid1 will be 3, fid2 will be 4, fid3 will be 5, then 3 and 4 are released, so nfid1 will be 3, nfid2 will be 4, but nfid3 would be the next available, 6, rather than the consecutive 5.

请先登录,再进行评论。


Paul Metcalf
Paul Metcalf 2013-6-4
You are defining B as a cell matrix, then trying to replace B with a different data type which is invalid. Try first initializing B properly. E.g. B = cell(m,n); Then to assign data into each cell in the array use B{1,1} = 'first line of data'; etc... Your code is really poorly constructed in general. If I have time tonight I'll look at sending you some more tips.
  1 个评论
Samyukta Ramnath
Samyukta Ramnath 2013-6-4
I think I get your point. You mean that I should first initialize B as a two dimensional matrix, then I can print the text character by character, after applying a checking condition (i.e. keep printing the characters if they aren't equal to some specific character?)

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Text Data Preparation 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by