how to skip lines that start with a certain character while reading a text file

27 次查看(过去 30 天)
I have a text file with two coloumns for a certain amount of rows. The coloumns are then divided from a text line that start with #, how can I load only the data by removing the # line?
  4 个评论
christian_00
christian_00 2024-6-18
I'm sorry, first time here, I put it in the question page below the "release" option but maybe others can't see it
dpb
dpb 2024-6-18
Hmmmm....I don't see the release information on the Q?; I do use the compact format, but I'd think it still should show it if user specified it. I'll have to open another window and see if the alternate....oh! I see; it's over there in the RH column with all that other stuff I never pay attention to, not part of the Q? itself. I'll have to try to remember to go look, but nobody else caught it, either, including the MATHWORKS employee....so it clearly isn't in the most suitable location.
Anyway, did you see my followup Answer given the release? readtable should solve your problem as a one-liner.

请先登录,再进行评论。

采纳的回答

dpb
dpb 2024-6-18
With the new information of R2018 that predates all the answers initially given, the easiest high-level toolset will be to use readtable; it goes back to R2013
tData=readtable('yourfile.txt','CommentStyle','#');
Alternatively, as mentioned in earlier sidebar conversation, reverting to the venerable textread would probably be my second choice even though it is now deprecated.

更多回答(4 个)

dpb
dpb 2024-6-18
@Taylor's solution will work, but leaves you with the need to convert the string data to numeric values to use it. For a direct solution, try
data=readmatrix('yourfile.txt','CommentStyle','#');
See <readmatrix> for details. Also readtable supports the same option if a table were desired instead of the array; also particularly if the file does have variable names as the first record.

Taylor
Taylor 2024-6-18
I would just load the data as a string and use the erase function to remove the "#"
  6 个评论
Taylor
Taylor 2024-6-18
@dpb Update from development on the readlines function: "The other functions mentioned are "formatted" text function that expect some structure of the data. readlines is meant simply to read the lines in the file. Its interface is kept minimal on purpose."
dpb
dpb 2024-6-18
编辑:dpb 2024-6-20
That makes no sense at all to me..."make things as simple as possible, but not too simple".
I suggest the choice should be the user's rather than the developer deciding they shouldn't need to do that and that the request for the additional option be retained.
While I'll agree not all the options available with the other members of the family are appicable to the purpose of readlines, I will argue to the end that skipping whole lines based on comment style is a line-reading functionality (as the subject question illustrates) and should be available.

请先登录,再进行评论。


Image Analyst
Image Analyst 2024-6-18
Try readlines to get each line in a cell array. Then loop over all lines skipping the ones that start with #:
fprintf('Beginning to run %s.m at %s...\n', mfilename, datetime('now','TimeZone','local','Format','HH:mm:ss'));
allLines = readlines('Data3.txt'); % Read whole file into a cell array, each cell being one line.
for k = 1 : numel(allLines)
thisLine = strtrim(allLines{k}); % Strip leading white space, in case there is any.
if startsWith(thisLine, '#')
% Skip lines starting with #
fprintf('Skipping %s\n', thisLine);
else
% Process lines NOT starting with #
fprintf('Processing %s\n', thisLine);
end
end
fprintf('Done running %s.m at %s...\n', mfilename, datetime('now','TimeZone','local','Format','HH:mm:ss'));
  2 个评论
dpb
dpb 2024-6-21
Given the lack of the obvious feature to omit comment lines in readline, the above could be somewhat abbreviated
fprintf('Beginning to run %s.m at %s...\n', mfilename, datetime('now','TimeZone','local','Format','HH:mm:ss'));
allLines = strtrim(readlines('Data3.txt')); % read file, trim lines
allLines(startsWith(allLines,'#')=[]; % remove comment lines
for k = 1 : numel(allLines) % iterate over the remainder
% Process lines NOT starting with #
fprintf('Processing %s\n', thisLine);
end
fprintf('Done running %s.m at %s...\n', mfilename, datetime('now','TimeZone','local','Format','HH:mm:ss'));

请先登录,再进行评论。


Image Analyst
Image Analyst 2024-6-18
Try this:
% Open the file for reading in text mode.
fileID = fopen(fullFileName, 'rt');
% Read the first line of the file.
textLine = strtrim(fgetl(fileID));
lineCounter = 1;
while ischar(textLine)
%fprintf('Read %s\n', textLine);
if startsWith(textLine, '#')
% Skip lines starting with #
fprintf('Skipping %s\n', textLine);
else
% Process lines NOT starting with #
fprintf('Processing %s\n', textLine);
end
% Read the next line.
textLine = fgetl(fileID);
if ~ischar(textLine)
break;
end
textLine = strtrim(fgetl(fileID)); % Strip off white space.
lineCounter = lineCounter + 1;
end
% All done reading all lines, so close the file.
fclose(fileID);

类别

Help CenterFile Exchange 中查找有关 Text Data Preparation 的更多信息

标签

产品


版本

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by