Reading text file word by word
16 次查看(过去 30 天)
显示 更早的评论
Input is the attached text file, with one long word, a newline, then several equal-length short words separated by white spaces. I would like to read the first word in a variable, then the other ones in another variable one by one, rather than all into a single string or into a single huge cell array. I tried to do this with fscanf in several ways, but failed, and even got the impression that fscanf is not complying with https://fr.mathworks.com/help/matlab/ref/fscanf.html no clue about what I am doing wrong.
fileID = fopen('dataset_300_8.txt');
long_word = fscanf(fileID, '%[$ACGT]'); % is there another way to stop reading at newline?
short_word = ' ';
while ~isempty(short_word)
short_word = fscanf(fileID, '%s'); % does not work: shouldn't %s stop as it encounters a white space?
% short_word = fscanf(fileID, '%10s'); % this also does not work
% short_word processing code here
end
fclose(fileID);
4 个评论
采纳的回答
jonas
2018-10-13
编辑:jonas
2018-10-13
Try this minor change
short_word = fscanf(fileID, '%s+');
↑
Edit: After further testing, any character after the %s gives the same results as it causes the fscanf to stop reading (due to mismatch). Another iteration begins where the previous attempt failed, so at the next word.
6 个评论
jonas
2018-10-13
编辑:jonas
2018-10-13
+ just means to continue reading characters until something else is encountered. It is used in textscan so I just assumed it applies here as well.
"A = fscanf(fileID,formatSpec) reads data from an open text file into column vector A and interprets values in the file according to the format specified by formatSpec. The fscanf function reapplies the format throughout the entire file and positions the file pointer at the end-of-file marker. If fscanf cannot match formatSpec to the data, it reads only the portion that matches and stops processing.
So formatSpec (%s) reads all characters, skips whitespaces and returns a single long character sequence whereas (%c) does the same but retains the whitespaces.
This means that adding any character after (%s) forces the scan to stop processing and the pointer is placed where the scan failed due to mismatch. If you do another fscan, then it continues to read where failed previously.
更多回答(1 个)
Image Analyst
2018-10-13
"I would like to read the first word in a variable, then the other ones in another variable one by one, rather than all into a single string or into a single huge cell array." <--- this is a really bad idea. I'm sure Stephen will soon give you the reasons why.
Better solution is to use fileread() followed by strsplit() to make the single cell array.
str = fileread('dataset_300_8.txt'); % Read entire file.
ca = strsplit(str, ' '); % Put each word into a cell
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!