How would I create a script to read files line-by-line to save memory

4 次查看(过去 30 天)
Hey guys,
I've done the MatLab Onramp, but I still feel extremely confused about what the hell I'm doing and it's frustrating me. I don't even know how to google the right qeustions, and interpreting pages from this website is a task that alone is like learning another language. Learning German was easier than this it feels like. So I'm sorry if I'm asking stupid questions, but I feel like I've been thrown into the deep end.
I have a .txt file that is 1,000,000,000 lines long, give or take a few 100,000,000 (no two files are the same length)
It constists of only numbers, no headers that I'm aware of.
Because of the file size, I cannot load the whole file. It needs to be read in portions. I'd rather not split the file or
I'm looking to gather variance data every 100,000 data points, to be organized in a single column/multiple row format.
Idealy, I'd also like to have new columns generated every 360 variance data points, however this isn't as important as generating the varience data first.
Thanks for the help!
  6 个评论
EL
EL 2019-8-20
编辑:Adam Danz 2019-8-21
I cut off a little section. This is the very top of a file I would use.
EDIT: Here's a script I'm currently using, and the errors I recieved
%% Loading Files for Input
% Currently, this can only do a single file at a time. Future editions intend to
% have multiple files loaded at once to save time.
prompt = 'Enter the name of the .txt file to run (e.g. Organism_L/D_Media_Temp_mmddyyyy_Signal.txt).';
inputfile = input(prompt, 's');
%% Data Collection Rate
prompt = 'Enter the Data Collection rate(Hz). [20,000]';
Hz=input(prompt);
if isempty(Hz)
Hz=20000;
end
%% Variance (n)
% This designated the amount of data to use for each datapoint generated.
% The standard amount is 5 seconds (100,000 datapoints). If left empty,
% this is the value that will be used. Otherwise, this will be done in
% seconds.
% Variables
% vt = variance time. The time in seconds is the input, which is then
% multiplied by 20,000.
prompt = 'Enter the time length for variance calc in sec (20,000 points/sec) [5 seconds].';
vt=input(prompt);
if isempty(vt)
vt=5;
end
%% Designating file for export
% This is the name of the .txt file that will contain the variance data
prompt = 'Enter the name for the output file (e.g. Organism_L/D_Media_Temp_mmddyyy_VarianceTime).';
outputfile=input(prompt,'s');
%% Initianting the code
% This is intended to be read line-by-line, then generating a single column
% text file of the variance data.
infile=fopen(inputfile);
outfile=fopen(outputfile);
fline=fgetl(infile);
line_index=1;
variancewindow = Hz*vt;
data=zeros(1,variancewindow);
while ischar(dline);
data(line_index) = str2double(dline) ; % str2double = Convert string to double precision value. What does that mean......?
line_index=line_index+1;
if line_index > variancewindow;
line_index=1;
variance_value=variance_function(data);
fprintf(outfile,'%f\n',variance_value);
data=zeros(1,variancewindow);
end
dline=fgetl(infile);
end
fclose(infile);
data=data(data~=0);
variance_value=variance_function(data);
fprintf(outfile,'%f/n',variance_value);
fclose(outfile);s
EDIT 2: The error's
Error using fgets
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in fgetl (line 32)
[tline,lt] = fgets(fid);
Error in NMDIII_Data (line 59)
fline=fgetl(infile);
Just to be clear, this is something I was workign on while asking this question. That's why I didn't post it in the original question.
Adam Danz
Adam Danz 2019-8-21
The methods proposed by myself and Walter involve reading in chunks of data rather than reading in line-by-line (as you're doing with fgets). I suggest you abandon that method and use textscan() instead.

请先登录,再进行评论。

采纳的回答

Adam Danz
Adam Danz 2019-8-21
编辑:Adam Danz 2019-8-21
Here's a demo that shows how to read in multiple lines of a file in chunks. I included lots of comments that explain what's going on. There's a section at the bottom where you can perform whatever operations you want on the data that is being read it. Walter's answer includes the variance calculations you described.
% Set parameters
file = 'x0.txt'; % The file you're reading; it's better to use a full path such as "C:\Users\name\Documents\x0.txt'
nrows = 5; %number of rows to read in at a time (you can change this to 100000 or whatever)
% Initialize the file for reading
fid = fopen(file);
% Set some loop variables
ignore = 0; %number of rows to ignore at the beginning (headers etc)
done = false; % flag that detects when file is complete
% Loop through until you've read all lines of file. When that
% happens, "done" will be switched to true and the while-loop
% will end.
while ~done
% Read the next 'nrows'; C will be a cell array of strings.
C = textscan(fid,'%s', nrows, 'delimiter', '\n', 'headerlines', ignore);
% If C is completely empty, you've finished the file.
if cellfun(@isempty, C)
% C has no data so the file is finished.
% Set the "done" flag to True so the while-loop ends
done = true;
% Skip the rest of this iteration.
continue
end
% Convert C from a cell array of strings to a numeric vector
% This assumes the content of the strings are numbers.
nVec = str2double(C{:});
% Increment the number of lines to ignore
ignore = ignore + nrows;
% % % % % % % % % % % % % % % % % % %
% %
% HERE IS WHERE YOU'LL DO WHATEVER %
% OPERATIONS YOU NEED TO DO WITH %
% THE VALUES YOU JUST READ IN. %
% %
% % % % % % % % % % % % % % % % % % %
end
% Close file
fclose(fid);
  2 个评论
Walter Roberson
Walter Roberson 2019-8-21
I do not see a purpose on the frewind() ? textscan() will continue from the current file position.
Adam Danz
Adam Danz 2019-8-21
Nice catch, Walter. I originally copied a similar code that uses fgetl() and adapted it to this but I guess I overlooked the frewind. I edited and fixed it. Thanks.

请先登录,再进行评论。

更多回答(1 个)

Walter Roberson
Walter Roberson 2019-8-20
vary_every = 10000;
expected_buffers = 10000; %1000000000 / 100000
group_every = 360;
variances = zeros(1, expected_buffers);
filename = 'YourFileNameHere';
[fid, msg] = fopen(filename, 'r');
if fid < 0
error('Failed to open file "%s" because "%s"', filename, msg)
end
buffcount = 0
while true
this_buffer = cell2mat( textscan(fid, '%f', vary_every) );
if isempty(this_buffer); break; end %end of file
buffcount = buffcount + 1;
variances(buffcount) = variance(this_buffer);
end
variances(buffcount+1:expected_buffers) = []; %trim off any extra
leftover = mod(buffcount,group_every);
if leftover ~= 0
variances(end+1:end+group_every-leftover) = nan;
end
variances = reshape(variances, group_every, []);
disp(variances)

类别

Help CenterFile Exchange 中查找有关 Language Support 的更多信息

产品


版本

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by