How do I split a sing column .txt file by line?

17 次查看(过去 30 天)
Hey Guys,
How would I split a .txt file into smaller files by the number of lines? This was simple to do in linux, but I can't seem to do it here.
An example of a file is attached (testv2.txt)
EDIT: The .txt files I'm working with are very large, and I need to split them into files with 72,000,000 lines. I can't split the files by size, because for some reason some files are different sizes, and the script I'm using tells time by using the # of lines.
Thanks for the help guys!
  4 个评论

请先登录,再进行评论。

采纳的回答

dpb
dpb 2019-8-28
Again, I'd suggest there's no need to actually create multiple text files to do this...several options exist in MATLAB; the simplest is probably to just process the file in chunks of whatever size you wish and calculate statistics or do whatever on each section...something like
fid=fopen('yourfile.txt','r');
NperSet=72E6; % set number elements to read per section
ix=0; % initialize group index counter
while ~feof(fid) % go thru the file until run out of data
ix=ix+1; % increment counter
data=cell2mat(textscan(fid,'%\t%f',NperSet)); % read the data chunk of set size, skip \t
stats(ix,:)=[mean(data) std(data) ...]; % compute, save the stats of interest
... % do whatever else needed w/ this dataset here
end
You'll want to preallocate the stats array to some reasonable approximation of the size expected and check for overflow, but that's the basic idea...simpler than creating and having to traverse thru a bunch of files to simply process in sequence.
The alternative is to use tall arrays or memmapfile or other of the features TMW has provided for large datasets. See <Large-files-and-big-data link>
  29 个评论
Adam Danz
Adam Danz 2019-8-31
Yeah I (still) agree that there's no need to store the segmented data in text files and that dpb's approach is the better one.
dpb
dpb 2019-8-31
On the comment about hidden and accepted bugs -- just for the record I did err in my earlier post regarding the comparison/subtraction of polynomial coefficients from observations; the code at that point indeed does correctly detrend the data for the x values selected.
I was, however, still at the point that I hadn't quite determined just why the x values were/are being selected as they are for the independent variable in the plots...it probably is ok if they have used this successfully for so long, but it still seems a peculiar way to have coded it if it is just piecing back together the time series/building a time vector from a fixed sample rate that I hadn't yet got my head around just what is behind having been done the way it is.

请先登录,再进行评论。

更多回答(1 个)

Adam Danz
Adam Danz 2019-8-28
编辑:Adam Danz 2019-8-29
This solution is quite fast and uses fgetl() to read in blocks of a text file and saves those blocks to a new text file. You can set the number of rows per block and other parameters at the top of the code. See comments within the code for more detail.
% Set the max number of lines per file. The last file may have less rows.
nLinesPerFile = 10000;
% Set the path where the files should be saved
newFilePath = 'C:\Users\name\Documents\MATLAB\datafolder';
% Set the base filename of each new file. They will be appended with a file number.
% For example, 'data' will become 'data_1.txt', 'data_2.txt' etc.
newFileName = 'data';
% Set the file that will be read (better to include the full path)
basefile = 'testv2.txt';
% Open file for reading
fid = fopen(basefile);
fnum = 0; % file number
done = false; %flag that ends while-loop.
while ~done
% Read in the next block; this assumes the data starts
% at row 1 of the txt file. If that is not the case,
% adapt this so that the header rows are skipped.
tempVec = nan(nLinesPerFile,1);
for i = 1:nLinesPerFile
nextline = fgetl(fid);
if nextline == -1
done = true;
tempVec(isnan(tempVec)) = [];
continue
else
tempVec(i) = str2double(nextline);
end
end
% Write the block to a new text file.
if ~isempty(tempVec)
fnum = fnum+1;
tempFilename = sprintf('%s_%d.txt',newFileName,fnum); % better to include a full
tempFile = fullfile(newFilePath,tempFilename);
fid0 = fopen(tempFile,'wt');
fprintf(fid0,'%.6f\n',tempVec);
fclose(fid0);
% (optional) display link to folder
disp(['<a href="matlab: winopen(''',newFilePath,''') ">',tempFilename,'</a>', ' saved.'])
end
end
fclose(fid);
  5 个评论
Adam Danz
Adam Danz 2020-6-15
My answer pertains to the main question which asks about text files that have a single column of data.
In your case, check out readmatrix(). If you read the documentation for that function, you'll see optional inputs that specify what line number your numeric data start which will be useful in your case. Also check out readtable() for an alternative.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Whos 的更多信息

产品


版本

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by